


4% 


THE LARGE-SAMPLE POWER OF TESTS BASED ON PERMUTATIONS 
OF OBSERVATIONS' 


By Wassity HoEFFDING 
University of North Carolina 


Summary. The paper investigates the power of a family of nonparametric 
tests which includes those known as tests based on permutations of observations. 
Under general conditions the tests are found to be asymptotically (as the sample 
size tends to <) as powerful as certain related standard parametric tests. The 
results are based on a study of the convergence in probability of certain random 
distribution functions. A more detailed summary will be found at the end of the 
Introduction. 


1. Introduction. Let X be a random variable whose values are points z in a 
space X. The probability distribution of X is characterised by the probability 
measure P(A) = Pr{X ¢€ A}, defined on an additive class @ of subsets A of X. 
(In the applications to be considered X can be taken as a finite-dimensional 
Euclidean space, @ as the family of Borel sets.) Let G be a finite group of trans- 
formations g of X onto itself which also map @ onto itself. Thus, for every g in 
G, every x in & and every A in @, the point gz is in & and the set gA of points 
gz, x € A, is in @. Let M be the number of elements in §. Let H be a hypothesis 
which implies that the distribution of X is invariant under the transformations 
in SG, so that for every g in G, gX has the same distribution as X. 

For example, let X be the n-dimensional Euclidean space, and let H be the 
hypothesis that the components X,,--- , X, of X are independent, each X; 
being symmetrically distributed about the median 0. Then H implies that the 
distribution of X is invariant under changes of sign of the X;. Here M = 2”. 
Alternatively, if X,,---, Xm,-°-°-, X, are independent, X,,---, Xm have a 
common distribution and X,,4:, --- , X, have a common distribution, then the 
distribution of X is invariant under the M = m! (n — m)! permutations which 
permute the first m or the last n — m components. 

All real-valued functions of z to be considered are understood to be measurable 
(@). The expected value of a function f(X) when X has distribution P will be 
denoted by Epf(X) or Ef(X). 

By a test of H we shall mean a function ¢(z), 0'< ¢(x) S 1, which expresses 
the probability with which H is rejected when X takes the value x. The power 
of the test @ with respect to P (the unconditional probability of rejecting H 
when P is the true distribution and test ¢ is used) is equal to Epp(X). If Erp(X) = 
a whenever H is true, the test ¢ is said to be similar of size a for testing H. 

This paper will be mainly concerned with tests of the following type. Let 
t(x) be a real-valued function on &. For every zx ¢ & let 


t (zx) < t (x) < a < t* (zr) 


1 Work done under the sponsorship of the Office of Naval Research. 
169 





170 WASSILY HOEFFDING 


be the ordered values ¢(gx), for all g in G. Given a number a, 0 < a < 1, letk 
be defined by 


k = M — [Mal, 


where [Ma] denotes the largest integer less than er equal to Ma. Let M *(z) and 
M(x) be the numbers of values ¢(z), (j = 1,---, M), which are greater than 
t (x) and equal to ¢“’(x), respectively, and let 


Mz) 
Since M*(z) S$ M — k S Ma and M*(z) + M°(r) > M —k+1> Ma, we 
have 0 S a(z) < 1. 
Let the test ¢(x) be defined by 


a(x) = 


[ 1 if &«&x) > t*(z), 
(1.1) d(x) =<a(xz) if tx) = t(z), 


lo. if ez) < (2). 


For every x ¢ X we have 
2 (gx) = M*(x) + a(x)Mx) = Ma, 


where )>., stands for summation over all g in S. If the distribution P of X is 
invariant under all g in G, we have 


Ma = Ep >. o(gX) = >, Er o(X) = MEp 9(X). 


Hence the test ¢ is similar of size a for testing H. 

Tests which are essentially of the form (1.1) have been considered by R. A. 
Fisher [3], Pitman [11], Welch [14]. Lehmann and Stein [8] have shown that tests 
of this type, with suitable functions ¢(x), are most powerful (or most powerful 
similar, etc.) for testing certain nonparametric hypotheses H against specified 
alternatives. 

A test of the form (1.1) differs from a conventional test mainly in that the 
“critical value,” ¢“’(X), is a random variable. This circumstance makes the 
exact evaluation of its power function difficult. It will, however, be shown that 
under certain conditions (“’(X) is close to a constant with high probability. 
Then the power of the test can be approximated in terms of the distribution 
function of ¢(X). 

More precisely, suppose that the objects so far considered, X = X,,G = S., 
t(xz) = t,(x), etc., are defined for an infinite sequence of positive integers n. It 
will be assumed that the size a of the test is fixed and that M — ~ asn— ~. 
Then 


k/M-l-ea asn— », 


Suppose that for a given sequence {P,,} of distributions of X = X“ the follow- 
ing two conditions are satisfied: 





LARGE-SAMPLE POWER 171 


Conpition A. There exists a constant \ such that t(X) — d in probability. 
ConpiTi0on B. There exists a function H(y), continuous at y = i, such that for 
every y at which H(y) is continuous 


Pr{t.(X) S y} — H(y). 
From (1.1) we have 
(1.2) Pr{ta(X) > tx’(X)} S Er.ox(X) S Prft,(X) 2 te’(X)}. 
Hence it follows that Conditions A, B imply 
(1.3) Ep, on(X) > 1 — H(A). 


It should be noted that the function ¢(z) in the definition (1.1) of ¢(x) can be 
replaced by any function ¢’(x) such that for every zx in X and every two elements 
g, g of G the difference t’(gx) — t’(g’x) has the same sign as t(gx) — t(g’x). For 
example, this is true for ¢’(z) = e(x)f(t(x)) + d(x), where f(y) is an increasing 
function, c(z) > 0, and c(x), d(x) are invariant under G (cf. Lehmann and Stein 
{8]). Thus if Conditions A, B are not satisfied, they may possibly be satisfied 
after ¢,(2) has been replaced by a suitable function 1’,(zx). 

In general \ and H(y) will depend on the sequence {P,}. It will, however, be 
seen that the dependence of A on {P,} is much less pronounced than that of 
H(y), in the sense that for a class C of sequences {P,,} the value \ is the same 
while 1 — H(A) ranges from a to 1. 

For every x in X& let MF,(y, x) be the number of elements g in G for which 
t.(gx) S y. For zx fixed, F,,(y, x) is a distribution function. Suppose that for some 
sequence {P,} the following condition is satisfied: 

ConniTI0n A’. F,,(y, X) — F(y) in probability for every y at which F(y) is con- 
tinuous, where F(y) is a distribution function, the equation F(y) = 1 — a hasa 
unique solution y = i, and F(y) is continuous at y = X. 

It will be shown in Section 3 that A’ implies that ¢{’(X) — \ in probability, 
so that A is satisfied with \ as defined in A’; furthermore, if H is true for every 
P,, of the sequence, ¢,(X) has the limiting distribution function F(y). 

Let ¢% be a test of the conventional form ¢%(z) = 1, a%, or 0 according as 
tn(z) > An, = An, OF < Ay, Where 0 S a® S 1 and X, is a constant. Suppose 
that d, and a* are so chosen that the test ¢% has size a for testing that P, = P%, 
a distribution for which H is true. It follows from the preceding paragraph that 
if A’ is satisfied for {P*}, then \, — A. Moreover, if B holds, 


(1.4) ep,6*(X) > 1 — H(A). 


Hence if C(A) denotes the class of all sequences {P,} for which A’, with A 
fixed, and B, with some H(y), are satisfied, and if C(A) contains {P*%}, then the 
powers of the tests ¢, and ¢*, tend to the same limit for every {P,} in C(A). The 
nonparametric test ¢, can be said to be asymptotically as powerful with respect 
to C(A) as ¢% . This result will be of particular interest when ¢° is a most power- 
ful, or otherwise ‘‘optimum,” parametric test, as in the examples of this paper. 





172 WASSILY HOEFFDING 


It also can happen that for different sequences { P,}, (“’(X) converges to differ- 
ent values \, but in every case the test ¢, is asymptotically as powerful as the 
most powerful test for a parametric family of distributions to which P,, belongs. 
This point will be illustrated in Section 7. 

In most applications to be considered, H(y) is either a (cumulative) distribu- 
tion function, or H(y) = 0. In the latter case the relations (1.3) and (1.4) merely 
imply that both tests are consistent (have limiting power 1). The case0 < H(A) < 
1 will usually occur when P, approaches, in a certain sense, the null hypothe- 
sis. For example, let P, be the distribution of two independent random samples 
of m and n — m observations from two normal distributions with means u; S p2 
and common variance o°. Let § consist of the @ = n! permutations of the n 
observations. Let t,(2) be the standard t-statistic for two samples. The results 
of Section 6 imply that Condition A’ is satisfied with F(y) = (y), where 


vy 
(1.5) &(y) = (2x) [. ot at. 


Condition B is satisfied with H(y) = @(y — c) if (uz — m1) o + {m(n — m)/n}' 
tends to a finite limit c. This will not be the case if, as is frequently assumed, 
m/n—p,0 <p <1,and6 = (pu. — mw) ~ is independent of n. In this case one 
can, however, conclude that if 6 is sufficiently small, the number N of observa- 
tions required to achieve the power 1 — ®(A — c) is approximately given by 
5{p(1 — p)N}* = c, and this is true for either test. In this sense the asymptotic 


relative efficiency of the two tests is arbitrarily close to one for 6 sufficiently small. 

The main object of this paper is to indicate several methods for ascertaining 
that Condition A is satisfied. By way of illustration the methods are applied 
to a number of tests which have been considered in the literature. In Section 2 
bounds for ¢“’(x) are obtained which provide a’simple criterion for consistency. 
Sufficient conditions for the convergence to zero of the variance of the random 
variable F,,(y, X) are given (Section 3) and used to obtain the large-sample 
power of several tests (Sections 4-7). The remaining Sections 8-10 show how 
a theorem can be applied which gives sufficient conditions for the convergence 
of F,(y, x”), fora sequence of fixed values x‘"’. The fulfilment of these condi- 
tions in probability for a sequence of random variables X‘"’ is found to be suffi- 
cient for the convergence in probability of F,(y, X‘"’). An extension to random 
distributions of the second limit theorem of probability theory (Section 10) 
generalizes a recent result of Ghosh [6]. 


2. Bounds for ¢*(x); consistency. In this section it will be shown that, given 
a test ¢(x) of the form (1.1), the function ¢(x) can always be so chosen that one 
or two moments of the distribution function F,,(y, x) are (essentially) fixed for 
all x, and the critical value ¢“’(x) is confined to a finite interval which depends 
only on a@. 

Let G be a random variable whose values are the MV elements g of SG, each 
element having the same probability M~*. Then F,,(y, x), as defined in Section 
1, is the distribution function of the random variable (Gr). 





LARGE-SAMPLE POWER 


Let m(x) and v(x) denote the mean and the variance of ¢(Gz), so that 
m(x) = M* >-,t(gz), v(x) = M* >.[t(gzr) — m(z)/’. 


Let (x) = v(x) *{t(x) — m(x)] if v(x) > 0, (x) = 0 if v(x) = 0. Then the test 
¢(x) in (1.1) is not changed if ¢(x) is replaced by ?¢’(x). Thus we may always as- 
sume that the distribution function F,,(y, x) has mean 0 and variance less than 
or equal to 1. If a probability limit F(y) of F,(y, X) exists for all y, then F(y) 
is a distribution function with the same properties. If, moreover, the probability 
of t(gX) = ¢(X) for all g in G tends to 0 as n — ~, then the probability of 
v(X) = 0 tends to 0, and F(y) has variance 1. In a similar way, if (x) = 0, we 
may, for instance, replace {(x) by a function ¢/(x) such that ¢’/(x) 2 0 and 
Et'(Gx) = c, an arbitrary positive constant. 
THEOREM 2.1. Jf t(z) 2 0, Et(Gr) = c > 0, then 


(2.1) t(z) << 
If Et(Gz) = 0, Et(Gzx)’ 1, then 


< 
(22) -(r2;) = 


Proor. We have 


MF(t™(xz) -0,2) Sk —-1<M— Mask S&S MPF,(t"(z), 2), 


so that 
F,(t (x) — 0,z) <1 —a s F,(t™(z), z). 
If t(x) = 0, Et(Gx) = c, then for every z > 0 


1 — F,(z — 0, x) = Pr{t(Gr) = 


Hence (2.1). 

If Et(Gr) = 0, Et(Gr)* = c* S 1, relation (2.2) follows in a similar way by 
using the inequalities of Tchebycheff-Cantelli (see, e.g., [4], p. 126 or [12], p 
198) 


F,(y, x) 3S Wea if y< 0, 


1 7 

Fly — 0,2) 21—- ap if y>O0. 
Apart from providing, via (1.2), crude bounds for the power of ¢, Theorem 
2.1 permits us to draw the following conclusion. If ¢,(x) satisfies either of the 
conditions of the theorem and, for some sequence {P,,} of distributions, H(y) = 
lim Pr {t,(X) < y} = 0 for all real y, which is a sufficient condition for con- 
sistency of the tests ¢%, then the tests ¢, are also consistent. This result is 

independent of whether t(X) converges in probability to a constant. 





174 WASSILY HOEFFDING 


3. Sufficient conditions for the convergence in probability of t{*(X). 

THEOREM 3.1. Suppose that for a sequence {P,\ of distributions of X = X‘”, 
F,(y, X) tends in probability to F(y) for every y at which F(y) is continuous, where 
F(y) is a distribution function and the equation F(y) = 1 — @ has a unique solu- 
tion y = d. Then tS? (X) — d in probability. 

Proor. By the definitions of t®’(x) and F,(y, 2), 


(3.1) Pr {t,’(X) S y} = Pr {Fa(y, X) 2 k/M} 


for every real y. Let y be a point of continuity of F(y). Since, by assumption, 
k/M — 1 — a = F(X), and y < X implies F(y) < F(A), the right-hand side of 
(3.1) tends to 0 if y < X. Similarly it tends to 1 if y > A. Hence t® (X) — in 
probability. 

A sufficient condition for a sequence of random variables to converge in proba- 
bility to a constant c is that their means and variances converge, respectively, 
to c and 0. If the random variables are uniformly bounded, the condition is also 
necessary. Hence F,(y, X) — F(y) in probability if and only if 


(3.2) EF,(y,X)— Fy), EF .(y, X)’ > F(y)’. 
We can write 
F,(y, 2) = M™ 2¢,C(gz), 
where C(x) = 1 or 0 according as ¢,(x) S y or >y. Hence 
(3.3) EF (y, X) = M™ D7, Pr {tn(gX) < y}, 


(3.4) EF,(y, X)* = M~* >>, >. Pr {ta(gX) S y, ta(g’X) S y}.- 


Let G be the random transformation defined in Section 2, let G’ have the same 
distribution as G, and let G, G’ and X be mutually independent. Then equations 
(3.3), (3.4) can be written as 


(3.5) EF,(y, X) = Pr {t,(GX) S y}, 
(3.6) EF ,(y, X)* = Pr {t,(GX) S y, tn(@’X) S y}. 


Note that ¢,(G@X) and t,(G@’X) are identically distributed, but not independent 
(except in the trivial case when the random variable F,,(y, X) has variance 0). 
Equations (3.5) and (3.6) imply that (3.2) is satisfied if t,(@X) has the limiting 
distribution function F(y), and t,(@X) and t,(G@’X) are independent in the limit. 
Making use of Theorem 3.1, we can state 

THEOREM 3.2. Suppose that, for some sequence {P,,.\ of distributions, t,(GX) and 
t,(G’X) have the limiting joint distribution function F(y)F(y’). Then for every y 
at which F(y) is continuous 


F,(y, X) — F(y) in probability, 
and tf the equation F(y) = 1 — a@ has a unique solution y = 4, 
t(X) — d in probability. 





LARGE-SAMPLE POWER 175 


We also observe the following. If H is true, t,(GX) and t,(X) have the same 
distribution. Thus if F,(y, X) — F(y) in probability for a sequence of distri- 
butions invariant under G, then ¢,(X) has the limiting distribution F(y). An im- 
plication concerning the test ¢% was pointed out in the Introduction. 

The next theorem, 3.3, gives conditions under which two functions ¢,(z) and 
t',(z) are, in a certain sense, asymptotically equivalent. 

THrorem 3.3. Let t',(z) = ¢n(x)tn(x) + da(x), where 


(3.7) cn(GX) — 1 and d,(GX) — 0 in probability, 
and let F’,(y, x) = Pr {t,(Gz) < y}. Then 

(3.8) F.(y, X) — F(y) tn probability 

uf and only if 

(3.9) F’.(y, X) — F(y) in probability. 


Proor. It is sufficient to show that (3.8) implies (3.9). As has been seen, (3.8) 
is equivalent to Pr {t,(GX) s y} — F(y), Pr {t,.(GX) s y, t.(@’X) Ss y} 
F(y)*. Due to assumption (3.7) these relations remain true if ¢,(x) is replaced by 
t’.(x). This implies (3.9). 

The fulfilment of the conditions of Thorem 3.2 can frequently be demonstrated 
with the aid of the central limit theorem for vectors. One version of this theorem, 
which will be of particular use in Section 6, is stated below as Theorem 3A. It 
easily follows from Uspensky’s proof [12] of the central limit theorem for vec- 
tors. 

Tueorem 3A. Let (Y:, Yi), (Y2, Y2), -*:, (Ya, Ya) be m independent ran- 
dom vectors, EY; = EY’, = 0,E | Y;\° < @, E| ¥; * < o. Let 


r-En(Eer)" 


n n -3/2 
w= DE ver (% ey?) 
1 1 
Then for any two real numbers y, y’ 
|Pr{¥ sy, ¥’ Sy'} — &)#(r’)| S Sle, », #’), 


where &(y) is defined by (1.5) and the function f(u, v,'w) is independent of n, y, y’ 
and of the distribution of the Y; , Y;, and f(u, v, w) ~ 0asu—->0,0- 0,0 0. 


4. Test for the median of a symmetrical distribution. Let x be the Euclidean 
n-dimensional space and H the hypothesis that the components X,, --- , Xn 
of the random vector X are independent and each X; is symmetrically distributed 
about the median 0. H implies that the distribution of X is invariant under the 
M = 2" transformations gX = (( — 1)"X,, --- ,(—1)*X,), j: = Oorl,i = 1, 





176 WASSILY HOEFFDING 


-» ,n. The random transformation Gz of x can be written Gr = (Gir, ---, 
G,tn), Where G,,--: ,G, are independent, G; = — 1 or 1 with probabilities 
4, 4. Let o(x) be the test (1.1) with 

” n ~} 

a) = Da(E2t) , 
or U(x) = Oif Dizi = (0. The factor (> 2’) is invariant under the transforma- 
tions g and is so chosen that ¢(Gz) has mean 0 and variance 1 (unless 2; = 

- = z, = 0). Bounds for ¢“’(x) can be obtained from Theorem 2.1. 

It follows from the results of Lehmann and Stein [8] that the test ¢ is most 
powerful similar for testing H against the alternative that X,,---,X, are 
independent with a common normal distribution whose mean is positive; the 
test @ with ¢(2) replaced by | ¢(x) | is most stringent similar for testing H against 
the alternative of a common normal distribution with nonzero mean. It will 
suffice to consider the former, ‘‘one-sided”’ test. The results will be easily applica- 
ble to the ‘“‘two-sided” case. 

Let Y, = G.X,, Y = GX; , where all G;, G; are independent, identically 
distributed, and independent of the X;. Then y? = y’ = X%, 


n n =} 
(GX) =n'> ¥Y; (nw > x?) ; 
t=1 


t=] 

n ; n -4 

UGX) =n Dy; (m7 > x1) . 
i=l \ i=l 


Suppose that X,,--- , X, are independent and identically distributed with 
mean y and positive variance 0”, By Khintchine’s theorem, 
mM te 


in probability. Hence (4(GX), t(G’X)) has the same limiting distribution (if 
any) as 


(4.1) ((c? +pv'int Dd y., +e"! D> vi). 
1 1 


The vectors (Y;, Y3),---, (Ya, Yn) are independent and identically distrib- 
uted, with 


EY, = EY’; =0, EY? = EY; =0' +4’, EY.Y;, = EGGX’ = EGEG,EX? = 


By the central limit theorem for identically distributed vectors (see, e.g., Cramér 
[2], p. 286), the random vector (4.1) has the limiting distribution function 
$(y)%(y'). The same is true of (t(@X), t(G’X)). By Theorem 3.2, t“(X) > \ in 
probability, where (A) = 1 — a. 
Under the same conditions we have for every fixed y 
lim Pr{t(X) < (y + n'y/o)(1 + (u/c)*) 4} = Ly). 


n—-eo 





LARGE-SAMPLE POWER 177 


Hence if u/c is independent of n (as is implied in the assumptions) and positive, 
the function H(y) of Section 1 is = 0, and the power of the test tends to 1. It 
follows from the Lyapunov form of the central limit theorem and its extension 
to vectors (for example, Theorem 3A) that all results remain true if the common 
distribution of X,,---,X, depends on n, provided E | X, |° = o(n’). If 
(u/o)n* converges to a constant c, then H(y) = @(y —c) ie alternative in- 
terpretation of this result, with y/o fixed but small, is tadiented in the Intro- 
duction. 

The function ¢(z) is an increasing function of Student’s statistic for testing 
whether the mean of n independent random variables with a common normal 
distribution is zero. Thus the test ¢% of Section 1, with suitably chosen X,, is 
equivalent to Student’s (one-sided) test whose size (for testing the normal hy- 
pothesis) is equal to the size a of the test ¢. The two tests have the same limit- 
ing power under the alternatives considered. 

Similar results can be obtained for more general alternatives, for instance 
when the X; are not identically distributed, provided only the central limit 
theorem can be applied. 


5. An analysis of variance test. Let X be a Euclidean space of np dimensions. 
Let X = (Xi,---,Xn) where X; = (Xa,---, Xi), t= 1,-°:,7, are n 
independent random vectors of p 2 2 components, and let H be a hypothesis 
which implies that the distribution of each X; is invariant under the p! permu- 
tations of its components. Then the distribution of X is invariant under a group 
S of M = (p!)" permutations. For example, if in an agricultural experiment p 
treatments are randomly assigned to the p plots in each of n blocks, and X;; 
is the yield of the plot in the ith block which has received the jth treatment, 
hypothesis H may be assumed to hold when there is no difference in the treat- 
ment effects. 

Let the test (x) be defined by (1.1) with 


> (= (tj — Ser 


t(z) = S 


¥ @- 1) % (ey - 


where xz; = p > 2.:2;;. If the denominator vanishes, define ¢(r) = p-l 
(say). The denominator, which is invariant under permutations in G, is so chosen 
that Et(Gxr) = p — 1 for all z. 

In the traditional analysis of variance one assumes that the X,; are inde- 
pendent normal with common variance and means EX,; = b; + t; . The equiva- 
lent of hypothesis H is that f; = --- = ¢,. The usual F- (or z-) statistic for 
testing this hypothesis is an increasing function of ¢(X). 

A nonparametric test essentially equivalent to ¢(x) was considered by Fisher 
[3] in the case p = 2, by Welch [14] and Pitman [11] in the general case. 





178 WASSILY HOEFFDING 


Extending the customary alternative, suppose that 
Xj = Yg+to:+4, ¢ = 1, ---,n; j=l,-*-,D, 
where the Y,; are mutually independent and identically distributed, 
EY, = 0, var Y,; = o > 0, 


and the 6; and ¢; are constants. It will be assumed that p is fixed and n > ~., 
We can write 


> u,; (x)? 
n' = (p — 1)" > (xij — 2i.)° 
t=—1 j=1 





t(xz) = 


where 


U; (x) = n? > (xij ze Xi); 


t=1 
Since 
Xy —-X. = ¥g —Y¥i +4 -% 
where f = p~ >_?t;, has a distribution independent of 7, the random variables 


a 


(p — 1)" > (Xi — Xi)’, 
are independent and identically distributed with mean o°(1 + 4°), where 
8’ =o¢ (p — 1)" > (t; — d*. 
It follows that 
n’ E (p — 1)" > (Xi; — X;.)* > o°(1 + &°) in probability. 


The expression on the left is invariant under the permutations in §. Hence if 
we let 

, 2 P 9 

(2) = 0 7°11 +8)" > uz)’, 


= 


then (t(GX), t(G’X)) has the same limiting distribution as (t/(@X), t'(@’X)). 
We have 


u(x) = a (52 —p n(x), (rz) =n? a (ca — b,), 


where 6, is Kronecker’s delta. Let 


V: = (GX), Vi; = (G’X). 





LARGE-SAMPLE POWER 179 


Then the random vector n'V = n'(V; sows Vg, Veg tse, Vv’) is the sum of 
n independent random vectors, each of which has the distribution of 


Z*= (Ze,,-*: , Zep. Zai,°-- » Jey); 


where Z,,--- , Z, are independent, Z; has the distribution of Y,;; + ¢;, and 
(Ri, ---,R,) and (Ri, ---, R’,) are two independent random vectors, inde- 
pendent of the Z;, whose values are the p! equally probable permutations of 
(1,---, 2p). By the central limit theorem for sums of identically distributed 
vectors, the limiting distribution of V — EV is 2p-variate normal with the co- 
variance matrix of Z*. We have 


P 
Ez, = Ep* 2d Zi, j=1,-++,D; m = 1, 2, 


hence 
EZe,=%, varZp, = o(1 + (1 — pd). 
If j ¥ 7’, 
EZn, Ze, = Ep (p — 1)" > Lily = pp — 1)" Ly tate: 


= pp — 1)? — p (p— 1)" De, 
hence 
cov (Zr, , Zr) = — op 8, j#ij. 


The Zz; have thesame distribution as the Ze,, and since EZ2,Z2', = E(p*>.2Z,)° 
has the same value for all j, 7’, we have 


cov (Zr, , Znj-) = C (say), iF =1,+++,p 
Hence 
EV; = EV’, = nil, 
var (V;) = var (Vj) = o°(1 + (1 — p')8), 
cov (V;, Vy) = cov (Vi, Vy) = — op 8’, jj, 
cov (V;, Vy) = C. 


Let || c;; || be an orthonormal p X p matrix with cp; = --- = cpp, and let 


P Pp 
W;= a Cir Vey Wi = ~ Cik Vi. 
kel k=ol 


Pp 


p—1 Pp pl 
> u(Gx)y = x W3, 2, u(@'x)° = 2 W;. 


jm 





180 WASSILY HOEFFDING 


For j, 7’ S p — 1 we obtain 
EW; = EW;=0, EW;W%3, = 0, 
EWW; = EW5W} = 6;70°(1 + 8). 


Hence the limiting distribution of (W:,--:,W 1, W1,-°-:, Wp) is that 
of 2p — 2 independent normal variables, each with mean 0 and variance 
o’(1 + 3°). It follows that the limiting distribution of ("(GX), t'(G’X)), and 
hence of (GX), (G’X)), is that of (x31, woh where x*,_, and wae are in- 
dependent, each having the chi-square distribution with p — 1 degrees of free- 
dom. By Theorem 3.2, ‘“’(X) — in probability, where Pr {x4_, > A} = 
The test is asymptotically as powerful as the conventional analysis of variance 
test of the same size a. 


6. Two-sample test; tests of randomness. Let 2X be the n-dimensional Eu- 
clidean space, and let H be the hypothesis that the n components of X = (Xi, 
- , X,) are independent and identically distributed. Then the distribution of 


X is invariant under all 1 = n! permutations of its components. Let ¢(x) be 
the test (1.1) with 


n 


> (a; — 4)z; 
(6.1) (x) = : 


n n 4? 
{2 (a; — 4)*(n — 1)" » (x; — a'\ 





where a,, ---, @, are given numbers, not all equal,d@ = n* > te, = 
n' }-fx,;. The numbers a; = a,; may depend on n. If the denominator van- 
ishes, that is, if 7; = --- = 2, , define ¢(x) = 0. The denominator is invariant 
under all pepetons, and is so chosen that (GX) has mean 0 and variance 1 
(unless 2; = -:- = 7,). 


If X has the probability density 
(6.2) (20°) *” exp ¢ (292)? - (a; — at 
\ 1 


and 7(x) denotes the standard t-statistic for testing § = 0, then 
T(x) = (n — 2)'t(x)(n — 1 — t(z)*)", 
so that T(x) is an increasing function of ¢(z). 
Lehmann and Stein [8] have shown that the test (x) is most powerful similar 


for testing H against the alternative (6.2) with ¢ > 0, and that the test based 
on | ¢(x) | is most stringent similar against (6.2) with —& ¥ 0. If 


(6.3) a; = lfori = 1,---,m; a, = Ofori =m+1,-:-,n, 


then (6.2) is the probability density of two independent random samples from 
two normal distributions with common variance and means £ + » and 7, and 





LARGE-SAMPLE POWER 181 


the numerator of {(x) is, apart from a constant factor, the difference of the two 
sample means. Essentially this test was proposed by Pitman [11]. 

We first consider a case where H is true. 

THEOREM 6.1. Let t(x) be defined by (6.1), let Z;, --- , Zn, +++ , be independent 
and identically distributed with E | Z,|* < « and var Z, > 0, and let Z = Z” = 
(Z1,-°** , Zn). Then in order that for every real y 


(6.4) F,(y, Z) — ®(y) in probability 
it is necessary and sufficient that either Z, be normally distributed or 


max (a; — 4)* 
Isisgn 
Te - 0. 


z (a; — a)’ 


tml 


(6.5) 


Results similar to Theorem 6.1 were obtained by Wald and Wolfowitz [13] 
and Noether [9], who gave sufficient conditions (stronger than those of Theorem 
6.1) for F,(y, Z) — &(y) with probability one, which, of course, implies (6.4). 
An argument analogous to that employed by Wald, Wolfowitz, and Noether 
will be used in Sections 8-10 below to obtain alternative sufficient conditions 
for (6.4). 

Proor oF THEOREM 6.1. We may and shall assume that 


(6.6) a = 0, Dai=1, EZ=0, EM =1. 
1 
Then 


n n -3$ 
t(x) = S az{(n -1° D(a- a| : 
1 1 
Since Z — 0 and n™ >-?Z% — 1 in probability we have 
(n — 1)7*7 DMZ - Za 


in probability. Hence (4(GZ), t(G’Z)) has the same limiting distribution (if any) 
as (u(GZ), u(G’Z)), where 


u(x) = Fran. 


Let gr = 2, = (2+,,°** , 2), Where r = (r1,--+, 7,) is a permutation of 
(1, ---,n). If R and R’ are two independent random vectors, independent of Z, 
such that Pr {R = r} = Pr {R’ = r} = M™ forall r, we can write (GZ, GZ) = 
(Zz, Zr). For any two permutations r, r’ we have 


u(Z,) = Do aZ,, = Da,Zi, 
1 1 


u(Z,) = do aZ,, = do ay.Zi, 
1 1 





182 WASSILY HOEFFDING 


where s; and s; are defined by 


, ei ° 
Tog = 1, Fag Ms 

First suppose that Z, is normally distributed. Then u(Z,) and u(Z,-) have a 
bivariate normal distribution with means 0, variances 1, and correlation coeffi- 


cient 


Pre = Eu(Z,u(Z-r) = D> as,as;- 
1 


Thus 
(6.7) Pr | u(Z,) Sy, u(Z, )s y’} = Hy, y’, Pre’), 


where 
} u? — 2puv + v 
®(y, y’, p) = f c (2") (1 — p*) exp{- — 3a — he pee Soh a ae 

If both sides of (6.7) are summed over all r, r’ and divided by M’, we obtain 
(6.8) Pr {u(Ze) S y, u(Ze) S y'} = EP(y, y’, pre). 
The random variable pee has the same distribution as Dd fane, , and we get 
Epee = (n — 1)’. Hence pre — 0 in probability. 

Since ®(y, y’, p) > &(y)®(y’) as p — 0, we have ®(y, y’, par’) — &(y)P(y’) in 
probability. And since #(y, y’, p) is a bounded function, this implies 


B®(y, y’, pre) > P(y)®(y’). 


Hence (y)#(y’) is the limiting distribution function of (u(Zx), u(Ze-)), and also 
that of (t(GZ), t(G@’Z)). Relation (6.4) follows from Theorem 3.2. 
Next suppose that (6.5) is satisfied. By assumption (6.6) this condition is 
equivalent to 
(6.9) max |a;|—0. 
lsisn 


, 


If we let Y; = a,,Z;, Y; = a,,Z;, the conditions of Theorem 3A are fulfilled, 
and we have Y = u(Z,), Y’ = u(Z,), p = pr, w = w = E|Z, "en , where 


Cc. = » |a,|*. 
Hence 


(6.10) |Pr{u(Z,) Ss y, u(Z-) S y'} — B(y)®(y’)| S glo, Cn); 


where the function g(u, v) is independent of n and of the distribution of the 
Y;, Y; (in particular, independent of r, r’), and g(u, v) +0 as u— 0, v > 0. 
Clearly g(u, v) can be so defined that g(u, v) S 1 for all u, v. 

From (6.10) we obtain in a similar way as before 


(6.11) | Pr{u(Ze) S y, u(Ze) S y’} — B(y)P(y’)| S Eg (ore, Cn). 





LARGE-SAMPLE POWER 183 


Since c, S max |a;| >>fai = max|a;|, condition (6.9) implies that c, — 0. 
Since pre — 0 in probability, g(pre , cn) — 0 in probability; and since g(u, v) 
is bounded, Eg(prer , cn) — 0. Relation (6.4) now follows from (6.11) by The- 
orem 3.2. 

Now suppose that Z, is not normal and (6.5) is not satisfied, the remaining 
assumptions of the theorem being fulfilled. Still assuming that the a, satisfy 
(6.6), denote by A, an a; for which | a;| = max (| a! ,---,|a@, |). Then in- 
finitely many | A, | are greater than a positive constant, and since the A, are 
bounded, a subsequence {An,} of {A,} converges toa constant A ~ 0. We can 
write u(Z) = u,(Z) = V, + W,, where V, has the distribution of A,Z, and 
is independent of W,. As m— o, V,,, has the limiting distribution of AZ, , 
which is not normal. 

Suppose (6.4) were true. Then (Zz), and hence u,(Z,), would have the limit- 
ing distribution ®(y). But Zz has the same distribution as Z. It would follow 
that u,,,(Z) tends in distribution to a normal random variable which is the 
sum of two independent, nonnormal random variables. By a theorem of Cramér 
({1], p. 52) this is impossible. The proof is complete. 

In the sequel an extension by the author [7] of a theorem of Wald and Wolfo- 
witz [13] will be required which, for purposes of reference, is stated below as 
Theorem 6A. For every positive integer n leta = (a,,--- , Qn),b = (by, «++ , Dn) 
be two vectors whose components a; , b; are real numbers and may depend on 
n. Suppose that the a; are not all equal and the b; are not all equal. Let the ran- 
dom vector R = (R,,--- , R,) be defined as in the proof of Theorem 6.1, and 
let 


(n a 1)! = (a; os G)br; 

n n ‘ 
Ea-wFo-o] 
where d = n* D-fa,, 6 = n“* Sof by. 

THEOREM 6A. A sufficient condition for 
(6.12) F,(y, a, b) + &(y) 


asn— © 78 that 





F,(y, a,b) = Pr = Y/> 


L@-a Loy 
[Sawa] [So - oF | 


Condition (6.13) is satisfied if 


(6.13) ni?- —0, p=3,4,-:- 


max (a; — 4)* max (b; — 5)* 
(6.14) ee | | ey | 


> (a; — a)’ > (b; — 5)* 





184 WASSILY HOEFFDING 


The next theorem is concerned with the behavior of ¢“’(X) under an alterna- 
tive which generalizes (6.2). 
THEOREM 6.2. Let t(x) be defined by (6.1), and suppose that 


X,;=Z:+4d:, 
where Z,,--:, Zn, ++: are independent and identically distributed with 
E|\|Z%|/)<@ 
and var Z, > 0, and d,, --- , dn are constants (which may depend on n). Then 
(6.15) t”(X) > Xin probability, 
where ®(\) = 1 — a, if 


max (a; — a)* 
(6.16) Z, is normal or *3*3" ______ + 0 


» (a; TT a)’ 
and 


. 


¥ (a a a 


(6.17) es —0 p = 3,4,-- 


x 
Sa - a] (Sa — ar)” 
[eta a} [Fa — ar] 


the latter condition being satisfied if 


max (a;— 4)" max (d; — d)’ 
(6.18) 9 Rs a oe 


X (a, — a)” » (d; mas d)’ 
Relation (6.15) also holds if (6.16) is satisfied and 
(6.19) n' y (d; — ad)?’ 0 
1 


or if (6.17) ts satisfied and 


(6.20) n'* >) (di — d)’ > @. 
1 


Proor, We again make the simplifying assumptions (6.6). In addition we may 
set 


a = 0. 
We then have 


X,-X¥ =Z,-2+4d, 


n* bo (X; . X)’ = n' > (Z: +3 Z)° + D.. + 2D, 8n; 
1 1 





LARGE-SAMPLE POWER 
where 
n i n n —3 
d, = (nt Dat), 8, = % z.(n a) ; 
i i i 


We have n™ >>?(Z; — Z)* 1 in probability. Since Es, = n™, s, > 0 in 
probability. Also 0 < 2D, < 1 + D%. Hence 
n? > (X; — xXy 
<a — 1 in probability. 
Thus if we let 
t'(z) = (1+ Di)' Yaz; = (1 + Diy u(z), 
1 
then (¢(GX), (G’X)) = (t(Xx), ((Xe-)) has the same limiting distribution (if 


any) as (t’(Xx), t’(Xx’)). 
Let 


v(r) = > a;d,, (nm Sat). 


Then 


2 ’ u(Ze) + D,v(R) , u(Ze-) + DR’) 
(6.21) (Xz) aa t'(Xzr) ae 


Suppose that conditions (6.16) and (6.17) are satisfied, and consider the 
joint distribution of u(Z»), v(R), u(Ze-), v(R’) as n > ~. It is seen from the 
proof of Theorem 6.1 that if (6.16) holds true and p,M’ denotes the number 
of pairs of permutations (r,r’) for which | Pr {u(Z,) s y, u(Z) Ss y'} — 
(y)(y’) | is less than a positive constant, then p, — 1 as n — ~. By the con- 
tinuity theorem for the Fourier transform an analogous relation holds for the 
difference of the characteristic functions, 


E exp (itu(Z,) + it'u(Z,)) — exp (—43¢ — 44”). 
Hence it follows that the characteristic function of (v(R), v(R’), u(Ze), u(Ze-)), 
M”?* > DS exp (irv(r) + ir'v(r’))E exp (itu(Z,) + it’u(Z,-)), 


differs arbitrarily little from 


Ee" Re't’"(#) --¥* —}t2 


if n is sufficiently large. By Theorem 6A, condition (6.17) implies that »(R) and 
v(R’) have the standard normal limiting distribution. Hence the limiting joint 
distribution of v(R), v(R’), u(Ze), u(Ze-) is that of four independent standard 
normal random variables. By (6.21) this implies that (t’/(X,), (X.x-)), and hence 
(t(GX), (G’X)), has the limiting distribution function (7)®(y’). 





186 WASSILY HOEFFDING 


If (6.19) is satisfied, then D, — 0. Since Ev(R)’ = n(n — 1)” is bounded, 
this implies that D,v(R) — 0 in probability, and (¢’/(Xx), t/(Xer-)) has the same 
limiting distribution as (u(Zx), u(Ze)). When (6.16) holds, we can apply 
Theorem 6.1. 

Similarly, if (6.20) is satisfied, (t’(X.»), t’(Xx-) has the limiting distribution of 
(v(R), v(R’)), which, under condition (6.17), is given by Theorem 6A. In every 
case the limiting distribution of (t(GX), t(@’'X)) is &(y)®(y’), and relation (6.15) 
follows from Theorem 3.2. That condition (6.18) is sufficient for (6.17) is stated 
in Theorem 6A. This completes the proof. 

If, in particular, XY has the normal distribution (6.2), we have d; = ai + », 
and the conditions of Theorem 6.2 are fulfilled if either 


> (a; — a)’ 


sa (s-2 
(6.22 net-2 _} 0, 


fea 


max (a; — a)’ 
(6.23) n? isis* - — 0, 


> (a; reek a)’ 
1 
(which implies (6.22)), or 
(6.24) n' > (a; — 4)? 0. 
1 


In the two-sample case (6.3) the conditions (6.22) and (6.23) are both equiva- 


lent to 
nin'yt = w(MY 0 
n 


where m’ = min (m, n — m). Condition (6.24) is fulfilled if and only if 
, 
= +0. 
n 
At least one of the two conditions is satisfied if m/n tends to some limit. 

If the conditions of Theorem 6.2 up to and including (6.16) are satisfied, 
t(X) is asymptotically normally distributed as n — . If the power of Student’s 
(one-sided) test of size a tends to a limit, the power of ¢@ tends to the same limit. 
Theorems 6.1 and 6.2 can be easily extended to the case where Z,, --- , Z, have 
a common distribution which depends on n. 


7. The two-sample test when one sample is small. It is of some interest to 
investigate what happens when the necessary and sufficient condition of The- 





LARGE-SAMPLE POWER 187 


orem 6.1 is not satisfied. In the two-sample case, which will be discussed in this 
section, this occurs only if m or n — m does not tend to infinity with n. 

We first consider a somewhat more general situation. Let 9% be the Euclidean 
n-dimensional space and G the group of all 14 = n! permutations of the n co- 
ordinates of a point in 9. Let the components X,, --- , X, of X be independent. 
The function f(x) can be arbitrary, subject only to the conditions to be stated. 

First assume that f(z) = u(x, --- , 2m) is a function of x ,--- , 2» only, 
where m is fixed as n — ~. The proportion of pairs of permutations r, r’ ‘for 
which the sets (r; , --- , Tm) and (ri, --- , Tm) have no elements in common tends 
to l asn — , Hence ¢(X,) and ¢(X,-) are independent for a proportion of pairs 
r, r’ which converges to 1. Suppose now that X,, --- , X» have a common dis- 
tribution and Xm41,-°-- ,X, have a common distribution. Then for a proportion 
of permutations r which tends to 1, ¢(X,) has the distribution function of u(X m4: , 
-++ , Xom), Which will be denoted by F(y). It follows that (¢(Xx), t(Xx-)) has the 
limiting distribution function F(y)F(y’). If the equation F(y) = 1 —a@ has a 
unique solution y = 4, ¢“’(X) — \ in probability by Theorem 3.2. 

The same conclusions hold under the more general assumption that f(x) is 
of the form c(x)u(x, , --- , 2m) + d(x), where c(Xx) — 1 and d(X,) — 0 in prob- 
ability, as follows from Theorem 3.3. 

Now let ¢(z) be defined by (6.1) with the a; given by (6.3). Then 


_ {m(n — m) < ee ee OF os ad 
t(z) = {nie — =) x (x; ay od zi mz). 
Suppose that m is fixed, and that the common distribution of Xm41,°-- , Xa 
has mean u and variance o*. Then 
X =n" > Xiu, vn” > (X - XYvoe 
1 1 
in probability. Hence the preceding results can be applied with 


ulti, +++, 2m) = mo” p> (x; — ). 


Observe that the probability limit \ of ‘“(X) depends on the distribution of 
X mi - Now it follows from [8] that the two-sample test ¢ is most powerful simi- 
lar for testing H not only against the normal alternative (6.2), (6.3), but also 
against any alternative with a density of the form 


(7.1) I s(x. 6) - pies 62), 
where 


fly, 0) = A(O)B(y)e™, 0, > 62. 


On the other hand, the most powerful test of size a for testing that X,,--- , X, 
are independent with the common density f(y, 6), where 


@ = (mé, + (n — m)62)/n, 





188 WASSILY HOEFFDING 


against (7.1) is of the form ¢*(z) = 1 or 0 according as m™* DT (2; = £) > Cn 
or < c,, Where o Cc, converges to the probability limit \ of ‘“(X). In other 
words, ¢“’(X) always tends in probability to the “correct”? value \, so that the 
test ¢ is asymptotically as powerful as the most powerful parametric test for 
the case where the function f(y, 6) is known. This phenomenon is analogous to 
the relation between, say, the one-sided two-sample ¢-test for normal distribu- 
tions with unknown variance o” and the most powerful tests (corresponding to 
the different values of oc”) when o’ is known. 


8. An alternative approach. In the remaining part of the paper an alternative 
method of proving that F,(y, X) tends to F(y) in probability will be considered. 
It is an extension of an argument used by Wald and Wolfowitz [13]. ; 

Suppose that the quantities a; = a,;, b; = b,; in Theorem 6A are random 
variables which have a joint distribution for all 7 and all n, and suppose that one 
of the conditions (6.13), (6.14) is satisfied with probability 1. Then (6.12) holds 
with probability 1. 

For example, let X = (Ui, Vi, U2, V2,---, Un, Vn), where the pairs 
(U;, Vi), i = 1, +--+, n, are independent and identically distributed, and let 
H be the hypothesis that U; and V; are independent. When H is true, the dis- 
tribution of X is invariant under the M = (n!)’ permutations which permute 
(U,, +--+, Un) or (Vi, ---, Vn). Let 


a-Si = ow 


t(x) = ee 


——— T° 
\% (u; = a)’ » (v; os ay 


A test equivalent to the corresponding test ¢(x) was considered by Pitman [11]. 

Since ¢(x) is invariant under permutations of the pairs (u; , v;), the distribution 
of (Gx) is the same as the conditional distribution with u,,--- , u, held in a 
fixed order and only the v; permuted in all possible ways. Hence if in Theorem 
6A we let a; = u;, b; = v;, then F,(y, a, b) is identical with F,.(y, x). Suppose 
that U; and V, have finite moments of any order. Then the strong law of large 
numbers implies that condition (6.13) of Theorem 6A, with a; = U;, b; = Vi, 
is satisfied with probability 1. Hence F,(., X) — #(y) with prolfability 1, and 
a fortiori in probability. Theorem 3.1 can now be applied. 

That condition (6.13) is satisfied with probability 1 can be shown under 
weaker assumptions (cf. Noether [9], [10]). Since, however, only weak convergence 
(in probability) of F,(y, X) is required for our purposes, a proof of strong con- 
vergence seems redundant. In fact, it will be shown in Sections 9 and 10 that if 
the conditions of Theorem 6A are satisfied as limits in probability, then the 
conclusion holds as a limit in probability. 


9. Ordinary convergence and convergence in probability. Let f,(z.), gn(z,), 
n = 1,2,---, be two sequences of real-valued functions of elements x, in a 
space X,,. Let Y, denote a random variable with values in X, , n = 1, 2, -°:- 


a 





LARGE-SAMPLE POWER 189 


THEOREM 9.1. If f(a.) —> 0 implies g,(x,) — 0, then f,(X,) — 0 in probability 
implies g,(X,,) — 0 in probability. 

Proor. Suppose the theorem were false. Then there exist two sequences of 
functions {f,}, {g.} such that f,(z,) — 0 implies g,(z,) — 0, and a sequence of 
random variables {X,} such that f,(X,) — 0 in probability but, for some 6 > 0 
and some « > 0, Pr {| gn(X,) | > 5} > « for infinitely many n. Let m be an 
arbitrary positive integer. Consider the events 


An = {| gn(Xn)| > 8}, BR = {| fa(Xn) |< mJ. 
We have Pr {A,} > « for infinitely many n, and there exists a number N,, 


such that Pr {B%} > 1 — }eforn > Nm. If A,- BS” denotes the joint occur- 
rence of A, and B®, 


Pr {A,-B&} = Pr{A,} + Pr {BI} —-1>e+1—}e-1>0 
for infinitely many n. 
Hence for every positive integer m there exists a sequence {z\"}, 2” eX, , 
such that | f,(z2”) | < m™ for n > Nm and | g,(z%”) | > 6 for infinitely many 


n. For every m = 1, 2, --- there exists an integer K,, = N+: such that 


loxa(tke)| > 8 


and Ky < Ke < «++, Let Ky = 0, 

te =2™ forn = Knitl,--:,Km; m=1,2,---. 
Then | fa(a'n) | < m™ for n > Km, hence f,(za) — 0, and | gn(t'n) | > 6 for in- 
finitely many n. But this contradicts the assumption. 

Let, in particular, zx, be the vector (a, b) of Theorem 6A, f, the left-hand side 
of (6.14) and g, = F,(y, a, b) — &(y). Then Theorem 9.1 shows that if a and/or 
b are replaced by random vectors, the fulfilment of (6.14) in probability implies 
that (6.12) holds in probability. Theorem 9.1 does not suffice to draw the same 
conclusion if the infinitely many relations (6.13) are satisfied in probability. 
That the conclusion is permissible will be shown in Section 10. 

We conclude this section by stating, without proof, conditions which imply 
the fulfilment of (6.14) in probability. It can be shown that 


max (X;— XY 

(9.1) ni GM sise ___ _, 0 in probability 
> (xX; — X) ; 
1 


if X,,---,X,,+°:: are independent, identically distributed, Z| X,|" < « 
for some h = 2. Relation (9.1) with h = 2 also holds if X,,--- , X, are in- 
dependent with common mean and finite second moments and satisfy the Linde- 
berg condition of the central limit theorem. More generally, (9.1) holds if 
EX; = d;, the X; = X; — d; satisfy one of the previously stated conditions 
and 


max (d; — d)’ 


nied Lists 


u (a, — a)? 


— 0. 





190 WASSILY HOEFFDING 

Hence one can obtain alternative sufficient conditions for {{(X) — d in proba- 
bility in the examples of Sections 6 and 8. Thus in the case of Section 8 it is 
sufficient that U,; and V; have finite moments of order 4. 


10. The second limit theorem for random distributions. A generalization by 
Fréchet and Shohat [5] of Markov’s so-called second limit theorem of probability 
theory states that if the distribution function F(y) is uniquely determined by 
its moments and {F,,(y)} is a sequence of distribution functions whose moments 
converge to the corresponding moments of F(y), then F,(y) — F(y) at every 
point of continuity of f(y). An extension of this theorem to the case where the 
F,(y) are random distribution functions and ordinary convergence is replaced 
by convergence in probability was given by M. N. Ghosh [6] under certain addi- 
tional assumptions concerning F(y) and its moments. The following theorem 
shows that the extension holds with no restrictions. 

TueoreM 10.1. Let F(y) be a distribution function on the real line which is 
uniquely determined by its moments 


w= [ y* dF(y), k = 1,2,- 


Let {F,(y)}, n = 1, 2,---, be a sequence of random distribution functions with 
moments unt , and suppose that 


ink — we in probability asn — ~, k= 1,2,- 
Then 
F,(y) — F(y) in probability 


at every point of continuity of F(y). 

The proof is based on the following lemma. 

Lemma 10.1. Let F(y) be a distribution function which is uniquely determined 
by its moments wy. , k = 1, 2, --- . Then for every y’ at which F(y) ts continuous 
and for every « > 0 there exist a positive integer m = m(y’, €) and a positive number 
5 = 5(y’, €) such that for every distribution function G(y) whose moments v, satisfy 
the inequalities 


| % — mel < 4, 


|G(y’) — Fiy’)|<e. 


Proor.” Assume the lemma to be false. Then for some y’ at which F(y) is 
continuous and for some ¢ > 0 there do not exist positive numbers m, 6 for 


* The author is indebted to H. Robbins for the proof of Lemma 10.1. 





LARGE-SAMPLE POWER 191 


which the conclusion of the lemma holds. Hence for every positive integer m 
there exists a distribution function G,,(y) with moments »,. such that 


| Ym — pe | < m™", k = l,e++,m, 


and 
|Galy’) — F(y’)|2 


But {G,.(y)}, m = 1, 2,---, is a sequence of distribution functions whose 
moments, vm, converge to , for all k = 1, 2,---. By the aforementioned 
theorem of Fréchet and Shohat, G,.(y’) — F(y’), which leads to a contradiction. 

Proor or THeoreM 10.1. Let y’ be a point of continuity of F(y). Given 


e > 0, let m = m(y’, €), 6 = 5(y’, ©) be defined as in Lemma 10.1. Given 
n > 0, choose N so that 


Pr {| wax — me | <8, k= 1,-+-,m}>1—y  forn > N. 


It follows from Lemma 10.1 that | F,(y’) — F(y’) | < ewith probability > 1 — 9 
for n > N. The proof is complete. 

It will now be shown that if the relations (6.13) are satisfied as limits in proba- 
bility, (6.12) holds in probability. It can be seen from the proof of Theorem 6A 
in [7] that if (6.13) holds for p = 3, 4, --- k, then the moments up to order k 
of the distribution F,,(y, a, b) converge to the corresponding moments of #(y). 
By Theorem 9.1 this implies that if (6.13) holds in probability for every p = 3, 4, 


--+ , then every moment of F,(y, a, b) converges in probability to the corre- 
sponding moment of #(y). By Theorem 10.1, F,(y, a, b) — ®(y) in probability. 

Relations (6.13) can be shown to hold in probability under conditions which 
are slightly weaker than those indicated at the end of Section 9, though the gain 
does not seem to be considerable. 


REFERENCES 


{1] H. Cramtér, Random Variables and Probability Distributions, Cambridge Tracts in 
Mathematics, No. 36, Cambridge University Press, 1937. 

{2} H. Cramtr, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[3] R. A. Fisner, The Design of Experiments, 5th ed., Hafner Publishing Co., New York, 
1949. 

[4] M. Frécuet, Généralités sur les probabilités. Variables aléatoires. Gauthier-Villars, 
Paris, 1937. 

[5] M. Frécuet anv J. SHonart, ‘‘A proof of the generalized second limit theorem in the 
theory of probability,” Trans. Am. Math. Soc., Vol. 33 (1931), p. 533-543. 

[6] M. N. Guosn, ‘“‘Convergence of random distribution functions,”’ Bull. Calcutta Math. 
Soc., Vol. 42 (1950), p. 217-226. 

[7] W. Hoerrprna, ‘‘A combinatorial central limit theorem,’’ Annals of Math. Stat., Vol. 
22 (1951), pp. 558-566. 

{8] E. L. LeumMann anp C. Stern, “On the theory of some non-parametric hypotheses,” 
Annals of Math. Stat., Vol. 20 (1949), pp. 28-45. 

{9} G. E. Noreruer, “On a theorem by Wald and Wolfowitz,’’ Annals of Math. Stat., Vol. 
20 (1949), pp. 455-458. 





192 WASSILY HOEFFDING 


[10] G. E. Noeruer, “Asymptotic properties of the Wald-Wolfowitz test of randomness,”’ 
Annals of Math. Stat., Vol. 21 (1950), pp. 231-246. 

[11] E. J. G. Prrman, “Significance tests which may be applied to samples from any popula- 
tion,’’ Jour. Roy. Stat. Soc. Suppl., Vol. 4 (1937), pp. 119-130. 
“Significance tests which may be applied to samples from any population. II. 
The correlation coefficient test,’’ Jour. Roy. Stat. Soc. Suppl., Vol. 4 (1937), 
pp. 225-232. 
“Significance tests which may be applied to samples from any population. III. 
The analysis of variance test,’’ Biometrika, Vol. 29 (1938), pp. 322-335. 

{12] J. V. Uspensxy, Introduction to Mathematical Probability, McGraw-Hill Book Co., 
1937. 

{13] A. Wap ano J. WoLrow17z, “‘Statistical tests based on permutations of the observa- 
tions,’”’ Annals of Math. Stat., Vol. 15 (1944), pp. 358-372. 

{14] B. L. Wexcu, ‘‘On the z-test in randomized blocks and Latin squares,’”’ Biometrika, 
Vol. 29 (1937), pp. 21-52. 





ASYMPTOTIC THEORY OF CERTAIN “GOODNESS ‘F FIT” CRITERIA 
BASED ON STOCHASTIC PROCESSES 


T. W. ANDERSON AND D. A. DaruineG! 


Columbia University and University of Michigan 


1. Summary. The statistical problem treated is that of testing the hypothesis 
that n independent, identically distributed random variables have a specified 
continuous distribution function F(x). If F(x) is the empirical cumulative dis- 
tribution function and y(t) is some nonnegative weight function (0 S$ ¢ < 1), 


we consider n’ sup_eczce {| F(z) — F(x) | w{F(x)]} and nf (F(x) — F,(x)} 


VIF (x)] dF(x). A general method for calculating the limiting distributions of 
these criteria is developed by reducing them to corresponding problems in 
stochastic processes, which in turn lead to more or less classical eigenvalue and 
boundary value problems for special classes of differential equations. For certain 
weight functions including y = 1 and y = 1/[t(1 — #)] we give explicit limit- 
ing distributions. A table of the asymptotic distribution of the von Mises w 
criterion is given. 


2. Introduction. One method of testing the hypothesis that n observations 
have been drawn from a population with specified distribution function F(z) is 
to compare the empirical histogram based on dividing the real line into inter- 
vals with the hypothetical histogram by means of the x’ tests. A test which does 
not involve a subjective grouping of the data consists of comparing the empirical 
cumulative distribution function with the hypothetical distribution function. 
Let F,(x) be the empirical distribution function based on n observations; that 
is, F,(x) = k/n if k observations are Sx for k = 0, 1, --- , n. We wish to con- 
sider a convenient measure of the discrepancy or ‘‘distance” between two dis- 
tribution functions. (For a more detailed discussion cf. Wald and Wolfowitz [21]}.) 
In accordance with the usual notions of a metric in function space, we treat the 
following measures: 


n | (Fa(x) — F(a) *VF(@)MF = W%, 


sup -V/n|F,(x) — F(z)| V¥{F(a)] = Ka, 
—wcr<c@ 
where y(t) (20) is some preassigned weight function. 

If a measure W’, is adopted, the hypothesis is rejected for those samples for 
which W?, > 2, say, and if a measure K, is adopted, the hypothesis is rejected 
when K, > 2, say. The numbers z and z, are to be chosen so that when the 
hypothesis is true the probability of rejection is some specified number (for 


1 This work was done mainly at the Rand Corporation. 
193 


LEE A POET AVS LIA DALAT NA NNR AIC Nae FIA 





194 T. W. ANDERSON AND D. A. DARLING 


example, .01 or .05). The main purpose of this paper is to give methods for find- 
ing the asymptotic distributions of W2 and K,, and, hence, approximate values 
of the significance points, z; and z.. We assume that the hypothetical distribu- 
tion is continuous. 

The fundamental ideas for tests of this nature are due to Kolmogorov [11], 
Smirnov [17], Cramér [2], and von Mises [19], and for large n certain tests have 
been developed by them. The present paper treats these tests in somewhat 
more detail, the analysis being greatly expedited by reducing the problems to 
straightforward considerations in the theory of continuous Gaussian stochastic 
processes. This reduction was developed by Doob [6], and used by him to give 
a simplified proof of Kolmogorov’s fundamental result. 

The principal innovation in this paper is the incorporation of a weight func- 
tion to allow more flexibility in the tests. Although we are able to make explicit 
calculations for only a few simple types of weight functions, the principal mathe- 
matical problems are reduced to classical problems in the theory of differential 
equations. 

The function y(t), 0 S ¢ S 1, is to be chosen by the statistician so as to weight 
the deviations according to the importance attached to various portions of the 
distribution function. This choice depends on the power against the alternative 
distributions considered most important. The selection of ¥(¢) = 1 yields nw”, 
the criterion of von Mises, for W*,, and the criterion of Kolmogorov for K,, . For 
W%, to exist for all samples except a set with probability zero, it is necessary 
and sufficient that the following integrals exist: 


(2.3) hi uy(u)du 


for every u4(0 < um < 1), 


1 
(2.4) / (1 — u)*y(u)du 


for every tz (0 < uw < 1). 

Given the data x, 22, -+- , X» arranged in increasing magnitude (with prob- 
ability one there are no equalities between any two of them, since the distribu- 
tion is assumed continuous), we obtain for practical computations the simpler 
variants of (2.1) and (2.2), 


° 1 
(2.5) Wi.=2> {6 [F(x)] - a oF (ep) + nf (1 — #)°y (dt, 


j=l nm 
(26) K, = Va max {VF (a;)] max [nF(z;) — (Gj — 1), 7 — nF(a))}, 
where 


(2.7) g(t) = [ Ws)ds, golf) = [ sv(s)ds. 





ASYMPTOTIC THEORY OF CERTAIN CRITERIA 195 


For (2.5) to hold the integrals ¢;(¢), ¢2(¢) must exist; for (2.6) to hold it is nec- 
essary and sufficient that 

1d 
y(t) | dt 
if ¥(¢) is differentiable (substituting the difference quotient in (2.8) if Y(t) is not 
differentiable). 


(28) (a — dv) | <1 


3. Reduction to a continuous stochastic process. Since F(z) is assumed con- 
tinuous, we can make the transformation u = F(x). Then the observations are 
u; = F(x,) (¢ = 1, 2, --- , n), and under the null hypothesis these can be con- 
sidered as drawn from the uniform distribution between 0 and 1. Let G,(u) be 
the empirical distribution derived from ™, ---, u,. Then W%, and K, are, 
respectively, 


(3.1) Wi=n [ [G(u) — u]*y(u) du, 
0 
(3.2) K, = sup | Vn | Ga(u) — ul Ayu). 


For every 0 < u S 1, Y,(u) = VWn[G,(u) — u)] is a random variable and 
the set of these random variables may be considered a stochastic process with 
parameter u. Putting 


(3.3) A,(z) = Pr \f Y*.(u)y(u)du < 2 = Pr {W%. < 2}, 


(3.4) B.(z) = Pr{ sup | Y,(u)| Wy(u) < z} = Pr {K, < 2}, 


Ss 
Osusl 


we wish to calculate A(z) = lim A,(z), n — «, and B(z) = lim B,(z),n— ~, 
if these limits exist. 

For fixed u; , Ue, -** , Ue the joint distribution of Y,(m), Y.(ue), --- , Ya(ue) 
approaches a k-variate normal distribution as n — «. Thus the asymptotic 
process is Gaussian (normal) and is specified by its mean and covariance func- 
tions. For finite n we have 


—-E(Ya(u)) = 0, 
E(Y,(u)Y,(v)) = min (u,v) — w. 
The limiting process is a Gaussian process y(u), 0 S u < 1, for which 
E(y(u)) = 0, 
E(y(u)y(v)) = min (u,v) — w, 


(3.5) 


(3.6) 


such that the probability is 1 that y(u) is continuous [6]. Putting 


(3.7) a(z) = Pr{ [ y’(u)y(u) du < ?, 





196 T. W. ANDERSON AND D. A. DARLING 


(3.8) b(z) = Pr{ sup | y(u)! Yy(u) S 2}, 
Osusl 


we wish to conclude that A(z) = a(z) and B(z) = b(z). Having established these 
equalities we shall be in a position to use certain representation theorems for 
stochastic processes to great advantage. 

In [4] Donsker has given the following theorem: Let R be the space of real, 
single-valued functions g(t) which are continuous except for at most a finite 
number of finite jumps, and let C be the space of continuous functions. Let 
F(g) be a functional defined on R and continuous in the uniform topology, 
1.€., SUPo<t<1 | Gn(t) — go(t) | > 0, n — ~, implies | F(gn) — F(go) | +0,n— ~, 
gn € R, go € C, except for a set of go(t) with 0 probability according to the probability 
associated with y(t). Then 
(3.9) lim Pr {F[Y,(Q] s z} = Pr {Fly(] s 2}. 

It follows from this result that if ¥(u) is bounded A(z) = a(z) and B(z) = b(z)° 

To handle more general weight functions for the case of integrals we want to 
extend this result. We shall assume that ¥(u) is continuous in any interval 
0< um S u S & < 1. Secondly we assume that : 


uy 1 
(3.10) [ W(t)t log log | at, / $(0(1 = 0 log log —— a 
0 t “1 1—¢f 


exist for every u; (0 < u, < 1). It is shown in Section 5 that 
(1 + dy(t/1 +d) = X(d) 


is the Wiener process which has the property ({12] p. 242 and p. 247) 


(3.11) Pr there exists a t) such that X°(#) < 2¢ log log for0 <t< ts} = 1, 


This implies that 


( 
Pr \ there exists a uw such that 
(3.12) 
2 sa l—wu \ 
y (u) < 2u(1 — u) log log - — for0 <u < Uo =1. 
t 


Thus with probability 1 y(‘y(Q) is majorized by ky()doglog(1/t) for 
k = 2(1 — uw). Thus if the first integral in (3.10) exists 


(3.13) I a y’ (t)y(t) at 


exists with probability 1 (taking the principal value when the integral is im- 
proper). A similar argument holds for the existence of 


(3.14) / y°(tW(t) dt. 





ASYMPTOTIC THEORY OF CERTAIN CRITERIA 197 


1 
Thus [ y’(t)¥(t) dt exists with probability 1. This defines a functional continuous 
0 


in the uniform topology. Hence from Donsker’s theorem A(z) = a(z). 


4. The limiting distribution of the integral criterion. In this section we show 
how to find a(z) in terms of the solution of a certain differential equation and 
give two examples of this method. The statistic W%, is essentially that introduced 
by Cramér [2]; in the case of y(t) = 1, it is n times the w criterion studied by 
von Mises [19] and Smirnov [17]. 

The method we use is analogous to the technique of Kac and Siegert [10]. We 
shall sketch briefly the extension of their results. 

By Mercer’s theorem a symmetric continuous correlation function k(s, £), 
0 < s,t¢ S 1, which is square integrable (in one variable and in both variables), 
can be expressed as 


a 


(4.1) Ke, ) =D = H00, 


ml 


where \; is an eigenvalue and f;(¢) is the corresponding normalized eigenfunction 
of the integral equation 


1 
(4.2) d I k(s, f(s) ds = f(t), 


and 
1 
(4.3) [ sep at = 6y, 
0 


the Kronecker delta. In most cases k(0,0) = k(1, 1) = 0; hence f;(0) = f;(1) = 0. 
Since k(s, ¢) is positive definite, 4; > 0. The series (4.1) converges absolutely 
and uniformly in the unit square. 

Let X; , X2, --- be independently, normally distributed with means zero and 
variances 1. If k(t, t) < «, then we can define 


(4.4) z(t) = = 9 X; f(D; 


j=l 


the series converges in the mean and with probability one for each ¢. Then 
z(t) is a Gaussian process with Ez(t) = 0 and Ez(s)z(t) = k(s, t). Thus 2(t) 
gives the same stochastic process as +/y(#) y(t) when k(s, t) = Wy¥(s) V(t) 
[min (s, ¢) — st]. From this it follows that with probability 1 


ot 1 P S, 1 i uf {s $s iF 
” W? = I v(t)y?(t) dt -| 2°(t) dt -[ 2 fix dt 
9) 


et ANA ANTES A Le ALO IAI IOEE, 


AAA AEA LUE TL AED lo AO AR AH 





198 T. W. ANDERSON AND D. A. DARLING 


See [10] for details of this proof. Thus 


Ele“”*] = E | exp (iu = xin) | 


j=l 


= Il Efexp iuX>/,] 
j=l 


= [1a - 2iway- 
y=l 


The infinite product converges absolutely and uniformly for all real u, and in 
general 1/A, = O(1/n’). 

We desire a more general result, however, because one weight function we 
treat leads to a kernel that is not continuous at (0, 0) and (1, 1). We use the 
following theorem of Hammerstein [9]: Let k(s, t) be continuous in the unit square 
except possibly at the corners of the square; let dk(s, t)/ds be continuous in the 
interior of both triangles in which the square is divided by the line between (0, 0) and 
(1, 1), and let the partial derivative be bounded in the domain « S s S 1 — e€ and 
0 St S 1 for each e(> 0). Then the series on the right of (4.1) converges uniformly 
to k(s, t) in every domain in the interior of the unit square. 

Since k(s, 1) = Vy¥(s) V¥(t) [min (s, t) — sé], the condition is that y(t) be 
continuous for 0 < ¢ < 1 and 


_ 70) 
(4.7 
_ v(s) 


be continuous for 0 S tS s S 1 — eand 


{3(1 — s)y’(s) — ¥(s)] 


0 a aN er 
(4.8) Wa) (h ~ Db) + HOO) 


be continuous fore S s St S 1. 
In this case (4.4) converges in the mean and with probability one for every 
tle S ¢t S 1 — €), and 2(t) is the same process as z(t) in this interval. 
1 
i [ k(t, t) dt < «, >a 1/A; < © (by Bessel’s inequality) and }°%, X°/,; 
0 


converges with probability one. Further, with probability one, }>%, X,f;(t)/W; 
converges in the mean (integral with respect to ¢) and it converges to z(t). Thus 
we have with probability one 


1 oo 
(4.9) [ 2°(t) dt = D0 X4/d;, 
0 y=1 


a0) | 2 at = / 20 at = / a |S 7 X; fs) | ae 


For ¢ small enough 


(4.11) el [ a(t) dt + [ 2) at| = [ k(t, t) dt + . k(t, 2) dt <6 





ASYMPTOTIC THEORY OF CERTAIN CRITERIA 199 


1 
for any 6 > 0. Thus the distribution of W* = [ z’(t) dt is the limiting distribu- 
0 


1l-e 
tion of / z’(t) dt. With a similar argument for the integral of z*(¢) we argue 


that the distribution of W’ is the distribution of }>%, X3/A; with characteristic 
function (4.6). 
THEOREM 4.1. Jf 


(4.12) k(s, t) = VWs) V¥(2) [min (s, t) — st] 


is continuous or if k(s, t) is continuous except at (0, 0) and (1, 1) with dk(s, t)/ds 
continuous for0 < s,t < 1,8 # t,and boundedineSsS1-—¢,0 StS 1 for 
every € (>0) then the characteristic function of W’ is given by (4.6), where {d;} are 
the eigenvalues of k(s, t) defined by (4.2). 

In our case the integral equation is 


(4.13) f® => I [min (t, s) — ts] y(t) V/¥(s) f(s) ds. 


It can be shown that if f(t) satisfies (4.13) for some A, then A(t) = foOv* 
satisfies 
(4.14) h(t) + WOh(t) = O 
for that \ (see [8], Sections 604 and 605) and A(0) = A(1) = O when (0, 0) = 
k(1, 1) = 0. Let A(t, A) be the solution of (4.14) for which 
h(O, A) = 0, 
(4.15) ah(t, d) | 
ot t=0 
If Y(t) is continuous (0 S ¢ S 1), such a solution exists and A(t, \) is continuous 
in (0 S ¢ S 1). Since A(1, A) = O for \ an eigenvalue of (4.13), the roots of 


h(1, 4) = 0 are the roots of the Fredholm determinant D(A) associated with 
k(s, t). It can be shown that 


OL ‘) 
(4.16) DO) = Fa = (1 *). 
The characteristic function (4.6) is 


(4.17) 


1 
V D(2it) 
The square root is taken so as to make (4.17) real and positive when the char- 
acteristic function is real and positive. The details of this proof are given in [8], 
Section 605. 
TuHEorEM 4.2. Let y(t) be continuous for 0 S t S 1. Then the equation (4.14) 


has a unique solution h(t, ) for every X > 0 satisfying (4.15). Then the character- 
istic function of W’ is 


h(i, 0) 





200 T. W. ANDERSON AND D. A. DARLING 


Thus we have reduced the problem of finding the characteristic function of 
W’ to finding the general solution of a differential equation. 

The semi-invariants x, of W’ are given quite easily (when they exist) through 
the eigenvalues. Since 


(4.19) ¢(t) = Il (1 — 2it/r,;)4, 


=1 


the coefficient of (it)"/n! in the power series expansion of log ¢(t) is 


(4.20) c, = 2" "(n — 1)!1>, (*) n = 1,2,- 


j=1 \A;j 


Hence we obtain for the mean and variance, for instance, 


1 

m=p= De, 
9 1 P 
=o = 2D(5). 


Even without knowing the eigenvalues, the moments can be calculated in terms 
of the iterates of the kernel k(s, ¢t). Putting k,(s, t) = k(s, t) = (min (s, ft) — 


1 
st) VV(s)¥(t), knai(s, 2) = [ k,(s, u)k(u, t) du, we have by means of the bilinear 
0 
expansion 


(4.22) ka(s, t) = Do dvz" Si (s)f,(0). 
Hence, 


(4.21) 


1 

(4.23) k,n = 2° "(n — » | k,(s, 8) ds 
() 

and, in particular, 


1 1 
P -{ Ble, ahd -[ di - aie 
(4.24) 


1 1 1 s 
= 2 / [ k*(s, t)ds dt = a/ (1 — vs) | t’ y(t) dt ds. 
“0 0 0 9 
We now present two applications of this method. 
Example 1. Let y(t) = 1; then W?, = nw’. The differental equation h’’(t) + 
AA(t) = O has a solution 
] “ 
4.25) h(t, A) = —= sin nt 
(4.25 (t, Vi sin V/A 


satisfying (4.15). Taking h(1, 0) as lim,» A(1, \) = 1, we find that 1/+/D(2it) is 


sin +/ 2it 


/ 7a ~ — = —— 
(4.26) oi(t) = Ee' w? a i Vv 2it = “/ - VJ 2it 


( sinh / —2it- 





ASYMPTOTIC THEORY OF CERTAIN CRITERIA 201 


This expression was given by Smirnov [15] and later by von Mises [20] using 
entirely different methods. A formal method for finding the distribution (by 
inverting the Fourier transform) was given later by Smirnov [16], but his ex- 
pression is not amenable to numerical calculation. The following procedure ex- 
presses a,(z) = Pr{W* < z} in terms of tabulated functions. 

It appears convenient to work with the Laplace transform. We have 


(4.27) t(t) = ¢(it) = E(e'*’) = _Vv2t 
sinh +/2t 
Using integration by parts, we obtain 
(4.28) [ e *' a;(z) dz (0 
0 


for the cdf a,(z). We wish to invert this Laplace transform. Now 


1 2\" avi —2Vity-4 
(4.29) ; é(t) = ay i (l—e % 


We suppose in the sequel that the real part of ¢, R(t) > O and apply the bi- 
nomial expansion to the last expression; thus 


3/4 
(4.30) 0 (7) ~ (-0(-?) OnE 
t t 0 z 


where (~) = (—1)’T(j + 4)/{T(4)j!]. It may be readily verified that the com- 


plex inversion formula can be used termwise here since the abscissa of convergence 
of &(t)/t is R(t) = 0, and the above series converges absolutely and uniformly in 
the half plane R(t) = 8 > 0. 

Since 


_ ' Vt 


- { —A? 
t eS A*/(4s 
0 2 Vr 8 


(4.31) - 
- = | — ds — 
pa Sy 1(3/4)s"*’ 
we have 


e-Avit 


r —s6t 
(4.32) ax = | eee) de, 


where 


—A*/(4z) 
e ( 


. A ° 
(4.33) 92) = 572 5(3/4) l Fie — ayn 





202 T. W. ANDERSON AND D. A. DARLING 


by virtue of the convolution property of the Laplace transform. In this integral 
we change variables, putting x = u sech’ 6 to give 


A - 
¢(z) = Ver om [ e (47/(42)) cosh? (ooch @ sinh 6)! dé 


Ae~4?/ (82) 


_ 5a Vit@/ae™ I e7 (47/(82)) cosh #( sinh g)* dé 


_ gatine (1x (2) 
- W2Qr z *\82/)’ 


where K;(x) is the standard Bessel function [22]. 
Having inverted the typical term, we finally obtain by summing 
7 l¢ hail 
a;(z) = —> (-1'( * 
V2 ~ j 


2 j=0 


(4.35) 
(4g + ihe P48) Ke, (45 + 1)*/ (162). 


The convergence of this series is very rapid. If a,(z) = 2 30 u;(z), we find 
that uj41(z)/uj(z) < kje““’*?’° (using the fact that K;(t) is a decreasing func- 
tion of t), where ky < 1.12, ki < 1.007, ke < 1.002, k; < 1.0007 for 7 2 3. Since 
K,(t) is positive, u;(z) > 0. Using a crude geometric series bound for R,(z) = 
>>%.4u;(z), we can show that for z < 2, R,(z) < .0002. Moreover, for z < 2, 
R,(z) < us(z) < w(z) < wm(z). In computation, therefore, one need only take 
as many terms in the series as are different from 0 in the number of decimal 
places carried. We give below a table of z for equal increments (.01) of a;(z) 
with the 5%, 1% and .1% significance points. The calculations have been car- 
ried to 6 figures before rounding off. The authors are indebted to Mr. Jack 
Laderman of Columbia University and the Numerical Analysis Department of 
the Rand Corporation for their assistance in preparing the table. 

The semi-invariants of this distribution are easily obtained since the eigen- 
values are \; = 1/(x°j’). Thus 

2 n-DIG1 


3,” j=1) 


(4.36) 
_ oin-2 (n — 1)! 

where B, are the Bernoulli numbers: B, = 1/6, Bz = 1/30, ete. 
Example 2. y(t) = 1/[t(1 — 2]. Since the variance of Y,(¢) = Vn IG, (t) — 
is (1 — #), an interesting weight function for Y%,(t) is the reciprocal of this vari- 
ance.” In a certain sense, this function assigns to each point of the distribution 


2 This suggestion was first made by L. J. Savage. 





ASYMPTOTIC THEORY OF CERTAIN CRITERIA 


TABLE 1 
Limiting Distribution of nw* 
a,(z) = lim Pr{nw? < 2} 


n—-0 


F a;(z) z a,(z) 


.08562 .34 17159 .67 
.02878 .0: .08744 35 17568 .68 
.03177 j .08928 .36 17992 .69 
.03430 ; 09115 .37 | = .18433 .70 
.03656 ; .09306 .38 | .18892 71 
.03865 09499 .39 | = .19371 e 
.04061 } .09696 .40 . 19870 .73 
04247 j .09896 41 . 20392 74 
04427 ; .10100 .42 | .20939 75 
.04601 . 10308 43 | .21512 .76 
.04772 . 10520 44 22114 77 
.04939 ae 10736 45 .22748 .78 
.05103 ; . 10956 46 . 23417 .79 
.05265 ; 11182 47° | .24124 .80 
05426 li 11412 48 | .24874 81 
.05586 ; .11647 49 .25670 82 
05746 ; . 11888 .50 |  .26520 .83 
.05904 3 12134 51 27429 84 
.06063 ; . 12387 52 . 28406 85 
.06222 3 12646 .53 | .29460 86 
.06381 : 12911 .54 30603 .87 
.06541 22 13183 .55 | .31849 .88 
.06702 . 13463 56 33217 .89 
.06863 13751 57 .34730 .90 
.07025 14046 .58 36421 91 
.07189 14350 .59 | .38331 .92 
.07354 . 14663 .60 | .40520 .93 
07521 . 14986 .61 .43077 .94 
.07690 15319 .62 .46136 95 
.07860 . 15663 63! 49929 96 
.08032 .16018 .64 | .54885 .97 
.08206 16385 65 .61981 .98 
.08383 16765 4 .74346 .99 
1.16786 .999 





i] 


www wy 
Qa ore 


“i 


ww ow be by 
bod 
DOP AA ELIDA LORE SERRE ORI OP Det He EO ee Le 
CP LEBENO RA SEO, 


w 
& 


F(x) equal weights. A statistician may prefer to use this weight function when 
he feels that y(t) = 1 does not give enough weight to the tails of the distribution. 





204 T. W. ANDERSON AND D. A. DARLING 


In this example 


/*{1 = 8) 


V (1 — ds’ 


a 4/ C—O — t)s 

- 1 — s)’ 
is not continuous at (¢, s) = (0, 0) or (1, 1); hence we need the extended result 
of Theorem 4.1 to justify our procedure. It is known that the Ferrer associated 
Legendre polynomials f,(t) = Pi(t) = «(1 — t)P;(2t — 1) satisfy the integral 
equation with A; = 1/[i(¢ + 1)] (see [23], p. 324). Thus the characteristic func- 


tion of W’ is : 
2 2 -} 
g(t) II (1 (1 - = J 


i=l SG +1) 


/ —2rit 
J 


"¥ con (5 VT + Bi) 


(4.37) 


An analysis similar to that used in Example 1 shows that the cdf, a2(z), can be 
expressed as 


az) = PriW’ Ss 2} 


1 1 
oe $ : (ra) /8—((45+1)2 #2) / (Bre) 
\ i (4j + » | e rz (4j+1) 28 8rz 5 ¥ 
0 \ 


Z j=0 


a 
2r +1) 29? z/(8(w2+1))—((45+1)2 x 2w2) /(8 
~ vm lao (47 + Le (45+1)2 x2) / (82) es 1 9 48w2)/Bs) 7, 
0 


7=0 


5. Theory of deviations. The second test criterion led to the calculation of 


B,(z) = Pri sup Vn|G,(u) — u| Vy(u) S 2}. 
<1 


Osus 
In order to handle the limiting distribution we consider the functional 


(5.1) K = sup | y(u) | Vy(u). 
Osusl 


It follows from the theorem of Donsker [4] that for Y(w) bounded we have 
lim B,(z) = Pr{K s 2}, 


n--2 
and the problem is reduced to that of calculating the distribution of (5.1). This 
is the elegant idea of Doob [6], who treated the case y = 1. 

This is known as an “absorption probability” problem on account of its very 
suggestive analogy with a simple diffusion model. It is clear that the event that 
—z(y(u))? Ss y(u) Ss z(y(u))*, 0 < u S 1} is equivalent to the event {K < z} 
thus the probability b(z) is, very crudely speaking, the “‘proportion”’ of all those 





ASYMPTOTIC THEORY OF CERTAIN CRITERIA 205 


paths y(u) of the diffusing particle which do not get “absorbed into” (i.e., inter- 
sect) the “barriers” y = +2(y(u))~. 


It is convenient to make a transformation due to Doob [6] which renders the 
analysis simpler. If we put 


a t 
X(t) = (1 + ty { —— }, 
(t) = (1+ dy (; + ) 
it is easy to verify that X(¢) is the Wiener-Einstein process; that is, X(¢) is 
Gaussian, X(0) = 0, E(X(t)) = 0, E(X(#)X(s)) = min (s, t). Then 


b(z) = Pr {|| X@| S&),0 StS o}, 


where 
1+? 
(5.2) et) = to 
hk 
Vee) 
Thus we have the absorption probability problem for the free particle with 
barriers z = +£(t) fort = 0. 


The method of solution is to treat the corresponding diffusion problem as a 
boundary value problem with the diffusion equation 


‘ af _1é*f 
(53) at 2 dz! 
associated with the region ¢ = 0, |x| S é(¢). In line with the preceding analogy 


S(t, x) will be the “density” of paths X(u) which for 0 S u S ¢ have not been 
“absorbed” and for which X(t) = x; hence 


f(t, x) dz 
[z|<€Ct) 


will give the probability of nonabsorption up to time ¢. It is the limit of this 


expression for t — «© which will yield b(z). For a more detailed discussion of 
these points see Lévy [12], pp. 78 et seq. 


We need the following existence and uniqueness theorem: 

THEOREM 5.1. Given that £(t) of (5.2) has a bounded derivative forts St S th, 
there exists a unique function p(t , y; t, x) such that for any continuous function 
g(y), | y | < &(t), the function 


(5.4) f(t, 2) = / g(y)p(te, y; t, 2) dy 


ju[ <&(t9) 


has the following properties: 
(1) f(t, x) satisfies (5.3) in the domain ty <t <t,| z| < & (0), 
(2) lim f(t,z) = 0, h>t> bh, 


z-eti(t) 


ee 


AR AOE DOI STOECLEE A! AMP HO 





T. W. ANDERSON AND D. A. DARLING 


lim f(t, z) = g (n) |n| < &(t). 


t-otg 


2-7 


The proof of this theorem is contained quite explicitly in the fundamental 
paper of Fortet [7] (especially ch. V), who considers in great detail the general 
problem of absorption probabilities. Fortet treats only the case of one absorbing 
barrier, but his results are easily extended to the above case of two barriers. 
The differential p(t), y; ¢, x) dx can be interpreted, to terms of order (dx)’, as 
the probability that if the diffusing particle starts at (4 , y) it will not have been 
absorbed in the barriers +£(f) during the interval (t , ¢), and will lie between x 
and x + dz at time /. 

We have not stated the best theorem possible. If é(f) is merely continuous 
the absorption probability density f(¢, x) exists. For the existence of a solution 
to (5.3) satisfying (2) and (3) of Theorem 5.1 it is sufficient to require that £(¢) 
satisfy a Lipschitz condition associated with the law of the iterated logarithm. 
Finally we remark in passing that unless f(t, x) is of the form (5.4) (the so called 
‘normal”’ solution of Fortet) its uniqueness is not assured (cf. Doetsch [3]). 

If in the theorem ¢(t) has a bounded derivative for ¢ 2 0 then we plainly have 

t(t) 
(5.5) b(z) = lim p(0, 0; t, x) dz, 
tre S—E(t) 
but if &(¢) does not have a bounded derivative for ¢ = 0, (5.5) can no longer be 
employed to determine b(z). However, if there are a finite number of intervals 
in each of which £(¢) has a bounded derivative and between which é(¢) has a 
simple jump discontinuity it is easy to modify the above result; in fact over 
some of the intervals £(f) may be infinite. A piecewise determination can be 
made and the solution can be continued to beyond the last discontinuity, and 
then (5.5) can be used. Suppose the points of discontinuity of é(f) are 
0<t<t< --- < t, and suppose £(t) is, say, left continuous. In the region 
(0, t:) we have the solution go(¢, x) = po(0, 0; t, z) by the above theorem. Now 
if ¢(t;). < &(t, + 0) we define gf(t, , x) by 


go(ts ’ 2), 
0, 


gi (th ’ x) = 


and if &(t;) > &(t,; + 0) we define gi(t,, 2) = gol, x), |x| S Et + 0). Then 
gi (t; , x) is continuous in | x | < £(t; + 0) and we have for t; < t < ft, a function 
gi(t, x) defined by Theorem 5.1; 


g(t,r2) = / gi(ti, y)pi(ts, y; t, x) dy. 
ly] <&(t,+0) 


In the same way we can define a function g?(t2, x) which will yield a function 





ASYMPTOTIC THEORY OF CERTAIN CRITERIA 207 


ge(t, x) for tt < t < t;. This process will ultimately yield a unique function 
gn(t, x) fort > ¢, . Finally 
g(t) 
(5.6) b(z) = lim gn(t, x) dz. 
tro +-£(t) 

It is clear that if (f) = « in some of the intervals the successive determina- 
tion of the functions g,(¢, x) may still be carried forward. This would correspond 
to an absence of the absorbing barrier over the interval. 

Using the relation (5.2) and the above remarks we have the following theorem 
for the weight function ¥(u): 

THEOREM 5.2. Suppose there is a finite sequence 0 = uw < um < Uz -*: < 
Un < Un4i = 1 such that in the interval (ux , Ux41] Y(t) is either (1) identically zero 
or (2) is bounded away from zero and has a bounded derivative. Then there exists a 
unique sequence of functions {px(tk, y; t, x)} such that for t in the interval 
((ux/(1 — un) = te < t < tear = Unai/(1 — tess) the conclusions of Theorem 5.1 
hold for the functions p(t. , y; t, x), k = 0,1, --- , n, E(t) being defined by (5.2). 

From this theorem we can generate a set of functions g:(t, 2), i <t < tka, 
k = 0,1, --- ,, and another set gi'(t , x), k = 1,2, --- ,n,as before. gts:(tea: , 2) 
agrees with g:(t.41 , x) over the set of x for which the latter is defined; that is, 
|a| < &(t4:), and is zero for other values; namely, (44: + 0) > |x| > E(te41) 
if (t) has a positive jump at 4&4, . Putting 


go(t, x) = po(0, 0; t, x), 


g(t, x) = / ge (tz, ype(te, yj t,z) dy, te <t < tear, k= 1,2,... 
|u| <(te+0) 


we finally have (5.6) for b(z). 

In a formal way the problem is thus solved, but the analytical difficulties of 
getting an explicit solution may be prohibitive. If £(¢) consists of a set of linear 
ares (which implies that +/¥(u) is of the form (au + 8)” in a piecewise way) 
then b(z) can be determined by quadratures (see, for example, Goursat [8], ch. 
29, Ex. 3). We make an application of this remark below. 

It is clear that if y(u) becomes infinite for some 0 < u < 1 then b(z) = 0 for 
every z > 0. But since y(0) = y(1) = 0 it is possible that y(uw) may become 
infinite for « = 0 or 1 and still yield a nondegenerate b(z). But in this case it is 
necessary that ¥(u) not dominate [2u(1 — u) log log 1/(u(1 — u))]~ for u near 
0 or 1. 

We shall consider several examples. 

Example 1. Let ¥(u) be a constant over a set of intervals, 


¥(u) = qe 2 0, Um<u Ss Wu, u = 0, Usa = 1, >= 0,1,---n. 


By choosing enough intervals, an arbitrary weight function can be approxi- 
mated, in a manner of speaking. 


It follows that the problem will be essentially solved if we can determine the 


pecs DAE NE OO 


LAEGER 





208 T. W. ANDERSON AND D. A. DARLING 


functions p(t , y; t, 2) of Theorem 5.2. In this case the function £(¢) becomes, 
by (95.2), 


. ~ Ux 

t)=- 
and we must find the solution to equation (5.3) which satisfies the conditions 
(2) and (3) of Theorem 5.1. 


As before we put & = u/(1 — u,), and it follows by a classical procedure of 
superposing an infinite system of sources and sinks along the line t = & that we 
may get the Green’s solution. In fact, let us put a source at / = t,x = yj, of 
strength s; , where 


Ys = 257 a + + (Dy, 


ov 


for j = 0, +1, +2, ---. Then for & <¢ S tissand| y!| < (2/V/q@) (1 +t) we 
obtain 


8; = (—1) expy — 2= (4 + Dj? -— 2 vi— 
; Qk qk 


.- < 8; — }(x-yj)2/(t— te) 

7 (i,y;42) = goes © SOF - 
(5. ) Di\te, UG ) Zz, Qn(t = &) ’ 
which may be directly verified by substitution to be a solution. It has been 
tacitly assumed that q, > 0; if gq = 0 we obtain only the term corresponding to 
j = O in the above solution, namely, the fundamental solution 


1 
Plas 42) = Tae 


Now on putting 


— (ry) 2/2( t— ty) 


ot deities Bot s \ = 1,2,++>.n, 
Tk mJ (1 + ah TO + t)>, k 1, 2, n 


and using the method outlined above, we obtain 


Tr v2 1 
gn(t, x) = / oa / po(O, 0; t1, rrprltr, 213 te, 22) +++ Paltn tn; t, 2) 


—r} 


dx, dtz +--+ dt, 


for pe(ts , Ze 5 tesa, Tei) aS in (5.7), and finally as an “explicit” solution, 


b,(z) = lim / gn(t, x) dx. 


t-o 


z 
\z —s= (1+) 
<a 


The resulting function b,(z) is a multiply infinite sum of integrals of an n- 
variate Gaussian distribution over an n-dimensional rectangle. 





ASYMPTOTIC THEORY OF CERTAIN CRITERIA 


We consider now the following special case of the above result 


O56<624 6% i, 
v(u) = 


; otherwise. 


Thus the test of the hypothesis is confined to detecting discrepancies over only 


a central portion of the interval [0, 1]. Using the preceding notation we have 
n = 2and 


qo 
1 


1— bd’ % 


and hence 


~ (22)/(2t)) 
V/2et, 
Pilti, 41; b, 2) = > Sg Mere eter 
je V 24(t2 — th) 
‘ 2je(t: + 1) + (—1)’n, 


p(0, 0; 41,2) = : 


s; = (—1)’ exp {—22°(t, + 1)j7* — 2za(—1)’}, 


—4(2—zq)2/(t— te) 
Dib, %2;t,x) = —— ————. 
; V/2x(t — ts) 
Thus, putting b,(z) = P(a, b, z), 


r) z(1+t) 2(1+t,) 
P(a, b, z) lim | / / pO, 0; t1, 21) prltr, 21; te, L2) pelle, 22; t, x) 


t+ —z(l+t2) 4—2(1+t)) 


dz, dxe dx 


z(1+t_) z(1+¢,) 
[ / p(0, 0; ba. 21) prlti, 21; te, 2 )dx,dzr2 


—2(1+te) 2(1+¢;) 
> ree 
jo V 2eti(te — th) 
2(l+t,) 2(1+t) 2 2 
Ti (x2 — y;) ) 
exp | — — — ———__ ] da dz 
[ 1+t)) Bu p( 2t, Wt — t) — 
for s; and y; as in (5.8). 


The double integral is seen to be over a bivariate normal distribution, and if 
o ” . . . . . 
we let n(a, , 22, 41, 2, 71, 02, p) be the normal bivariate density in x, , 72 with 


re 


eA BEEN ID NLAT EO 





210 T. W. ANDERSON AND D. A. DARLING 


: ° 2 2 ° . e,e 
means ui, Me, Variances oj, o2 and correlation p we obtain by rewriting the 
above integral 


P(a, b, 2) 


C ea(l+t,) 2(1+ te) ( 
5 — 2845 : ? () 2 2 
= i (—1)’e | / n(21, Loy Mi» M2 y T1, 02, pi)dx2dxy, 
j= —z(l+t, —z(1+ tg) 


where 


(3) . ; () 2 2 2 i ty 
wi = —22j(—1)’h, we” = 227, o1 =t, o2 = bh, pj = (—1) s/*. 

A somewhat simpler way of writing this result is as follows. Let M(u, v, £, n, p) 
be the volume under the normal bivariate surface with means zero and variances 
1 and correlation p which is above the rectangle with vertices 


r=u+té, 
yrv-t nn. 


Then, remembering that ¢; = a/(1 — a), t2 = b/(1 — b) and M(u, », &, , p) = 
M(—u, v, , n, — p), we obtain after a simple transformation of the above integral 


x 


P(a, b, z) = Zz (— 1)e72""2” 


j=—@ 


M (2 a, ew ee oe ae) / i —9) 
’ 4/ TO V b ’Va(l—a)’Vb(1—b)’ b(1 —a)/° 


(5.9) 


There are tables available in which the function is tabulated; see also Pélya 
[14]. Also, if either a = 0 or b = 1 then p = 0 and the function can be calculated 
with the ordinary univariate Gaussian tables. Putting a = 0, b = 1 simultane- 
ously we obtain Kolmogorov’s result 


cor 
P@,1,2) = D> (-1i er", 
j=—0o 
which has been tabulated [18]. In the general case the convergence is very rapid 
and good results can be obtained by using a few central terms (in (5.9) the terms 
corresponding to +) are clearly equal). 

The formula (5.9) is in disagreement with a recent announcement (without 
proof) of Maniya [13]. Maniya’s note appeared subsequent to a restricted paper 
by the authors. 

By using the general formula above it is possible to get, for example, a weight 
function to test discrepancies over only the tails of the distribution, etc. 

Example 2. We next investigate 


Pn 


ce <a<usdb<l, 
y(u) = jul — u) 


\ 0, otherwise, 





ASYMPTOTIC THEORY OF CERTAIN CRITERIA 211 


which is the weight function considered before with the W* test. By an earlier 


remark we must have a > 0 and b < 1, else absorption is certain and b(z) is 
degenerate. The transformation (5.2) yields 


b \ 
be(z) = X(t) | < Vt, j peer 6 A ees > 
) 
where X(t) is the Wiener-Einstein process. Here it appears convenient to make 
another transformation. Let u(t) be the Uhlenbeck process with correlation 
parameter 8; that is, u(t) is stationary Gaussian and Markovian with 


E(u(s)u(t)) = exp (—8|t — s|). Then from the known correspondence (cf. 
Doob [5]) 


X() = viu(2. log t) 


we obtain 


| 1 
ble) = Pr{ | wd < 2, 5; log © 5 << 55 log it. 


or since the process is strictly stationary 


( 
bo(z) = Pr y | ul) | $2z40<t<= give aah, 
which is an absorption probability with a uniform barrier. 

The function b2(z) is of some importance in the theories of communications 
and statistical equilibrium (cf. Bellman and Harris [1]), and may eventually be 
tabulated. It seems very difficult to give a complete analysis, but the following 
partial result is given without proof. 

Let a = } log (b(1 — a))/(a(1 — b)) so that b.(z) is a function of a. Then it 
is possible to find the Laplace transform of b2(z) in the following form: 


: eb.(z) da = 1f, - 2 —-- aut A 


® D4(/22) + D.(—vV/22) 
[- “PD Al/ 28) + Da(—vV28) as\, 


where D,(z) is the Weber function [23]. It seems very difficult to get even any 
qualitative information from this formula. 


REFERENCES 


{1] R. Bettman anv T. Harris, “Recurrence times for the Ehrenfest model,’’ Pacific 
Jour. Math., Vol. 1 (1951), pp. 179-193. 

[2] Haratp Cramér, ‘On the composition of elementary errors,’’ Skandinavisk Aktuarie- 
tidskrift, Vol. 11 (1928), pp. 13-74, 141-180. 

[3] G. Dorrscn, ‘‘Les équations aux dérivées partielles du type parabolique,”’ 


Enseigne- 
ment Math., Vol. 35 (1936), pp. 43-87. 


i, 


NIN a = NaN INR 


(2 EIIGARE ANNALS LEAL LALA LION IE 





212 T. W. ANDERSON AND D. A. DARLING 


[4] M. D. Donsker, ‘Justification and extension of Doob’s heuristic approach to the 
Kolmogorov-Smirnov theorems,’’ Annals of Math. Stat., Vol. 23 (1952), pp. 
277-281. 
[5] J. L. Doon, ‘‘The brownian motion and stochastic equations,’’ Annals of Mathematics, 
Vol. 2 (1942), pp. 351-369. 
[6] J. L. Doon, ‘Heuristic approach to the Kolmogorov-Smirnov theorems,’”’ Annals of 
Math. Stat., Vol. 20 (1949), pp. 393-403. 
[7] R. Forret, ‘‘Les fonctions aléatoires du type de Markoff associées 4 certaines équations 
linéaires aux dérivées partielles du type parabolique,’’ Jour. Math. Pures Appl., 
Vol. 22 (1943), pp. 177-243. 
[8] E. Goursat, Cours d’Analyse Mathématique, Vol. 3, 2nd ed., Gauthier-Villars, Paris, 
1915. 
(9] A. Hammerstern, ‘Uber Entwicklungen gegebener Funktionen nach Eigenfunktionen 
von Randwertaufgaben,”’ Math. Zeit., Vol. 27 (1927), pp. 269-311. 
{10) M. Kac anp A. J. F. Srecert, ‘‘An explicit representation of a stationary Gaussian 
process,’’ Annals of Math. Stat., Vol. 18 (1947), pp. 438-442. 
{11] A. N. Kotmocorov, ‘‘Sulla determinazione empirica delle leggi di probabilita,’’ Giorn. 
Ist. Ital. Attuari, Vol. 4 (1933), pp. 1-11. 
[12] P. Livy, Processus Stochastiques et Mouvement Brownien, Gauthiers-Villars, Paris, 1948. 
[13] G. M. Manrya, ‘‘Generalization of the criterion of A. N. Kolmogorov,”’ Doklady Akad. 
Nauk SSSR (NS), Vol. 69 (1949), pp. 495-497. 
[14] G. Pérya, ‘‘Remarks on computing the probability integral,’’ Proceedings of the Berke- 
ley Symposium on Mathematical Statistics and Probability, University of Cali- 
fornia Press, 1949, pp. 63-78. 
[15] N. V. Smirnov, “Sur la distribution de w’,”’ C. R. Acad. Sci. Paris, Vol. 202 (1936), 
p. 449. 
[16] N. V. Smirnov, “On the distribution of the w* criterion,’’ Rec. Math. (Mat. Sbornik) 
(NS), Vol. 2 (1937), pp. 973-993. 
[17] N. V. Smirnov, ‘‘On the deviation of the empirical distribution function,” Rec. Math. 
(Mat. Sbornik) (NS), Vol. 6 (1939), pp. 3-26. 
{18] N. V. Smirnov, ‘Table for estimating goodness of fit of empirical distributions,” 
Annals of Math. Stat., Vol. 19 (1948), pp. 279-281. 
[19] R. von Mises, Wahrscheinlichkeitsrechnung, Deuticke, Vienna, 1931. 
{20} R. von Mises, “‘Differentiable statistical functions,’’ Annals of Math. Stat., Vol. 18 
(1947), pp. 309-348. 
. WALD anv J. WoLrow17z, ‘“‘Confidence limits for continuous distribution functions,” 
Annals of Math. Stat., Vol. 10 (1939), pp. 105-118. 
22) G. N. Watson, A Treatise on the Theory of Bessel Functions, Cambridge University 
Press, 1922. 
[23] E. T. WuirraKerR anp G. N. Watson, A Course of Modern Analysis, Cambridge Uni- 
versity Press, 1927. 





A GENERALIZATION OF THE NEYMAN-PEARSON 
FUNDAMENTAL LEMMA' 


By HerMAN CHERNOFF AND HENRY ScCHEFFS 
University of Illinois and Columbia University 


1. Summary. Given m + n real integrable functions fi, --+ ,fm 91» °°" 5 Gn 
of a point x in a Euclidean space X, a real function ¢(z: , --~ , 2x) of m real vari- 
ables, and m constants ¢; , --- , Cm, the problem considered is the existence of a 


set S° in X maximizing off gi dx, ---, [ Jn az) subject to the m side condi- 
8 8 


tions [ f; dx = c,;, and the derivation of necessary conditions and of suf- 
8 
ficient conditions on S°. In some applications the point with coordinates 
(/ gidx,---, [ Jn az) may also be required to lie in a given set. The results 
8 8 


obtained are illustrated with an example of statistical interest. There is some 
discussion of the computational problem of finding the maximizing S°. 


2. The problem. The Neyman-Pearson fundamental lemma concerns the 
problem, given a number of integrable functions, to form their integrals over a 
variable set S, and to find a set S° (if any) for which one of these integrals is 
maximum subject to the condition that the others have fixed values. The gen- 
eralization considered here is to maximize a function of several integrals, subject 
to similar side conditions. 

More precisely, we are given m + n integrable’ functions f(x), --- , fm(x), 
g(x), --- , gn(x) of a point xz in a Euclidean space S, « real-valued function 
(2:1, ‘++ , 2n) of n real variables defined on the n-dimensional Euclidean space 
Z, or at least on a suitable subset of Z to be specified later, m constants c; , --- 
Cm, and a subset A of Z. Let S denote any Borel set in X and form 


(2.1) of f gidx, --- , [ o0dr). 


The problem is the existence and characterization of sets S° which maximize (2.1) 
subject to the m conditions 


(2.2) | feax md Ges a 


’ 


and the further condition that the point with the coordinates 


yt (/ gidz, --- , [ ona0) 


1 Work of Scheffé sponsored by the Office of Naval Research. 
? With respect to Lebesgue measure on the Borel sets. 


ow 
“iodo 





214 HERMAN CHERNOFF AND HENRY SCHEFFE 


lie in the given subset A of Z. If the only side conditions are of the form (2.2 
then A = Z. The Neyman-Pearson lemma refers to the case where 


(2.3) : ¢=4, A = Z. 


Briefly the history of the problem is the following. It arises in the Neyman- 
Pearson theory of optimum statistical tests and its generalizations. It was 
treated in the important special case (2.3) by Neyman and Pearson [5], [6], who 
obtained the inequalities (4.1) below, with the symbol X, in (4.1) replaced by 
X, as sufficient conditions for a maximizing set S°. The problems of existence 
and necessity in the case (2.3) were recently solved in all generality by Dantzig 
and Wald [1]; the necessity problem in this case had been solved under some 
restrictions (including m = 1) in the original paper [5] of Neyman and Pearson. 
A statistical example which does not come under the special case was recently 
investigated by Isaacson [4], who obtained sufficient conditions for his problem; 
this example falls under our treatment and is discussed in Section 8. 

In this paper we obtain an existence theorem, and necessary conditions and 
sufficient conditions for a maximizing S°. To cbtain these the results of Dantzig 
and Wald are employed, as well as their device of considering certain vector 
measures to which the Lyapunov theorem [3] may be applied. Construction and 
computation of a maximizing S° are also considered. 


3. Further notation and the condition @. The symbol S (with or without 
superscripts) will always be understood to denote a Borel set in the Euclidean 
space X, and the symbols f;, g; will always denote integrable’ functions of a 
point xz in X. In addition to the n-dimensional Euclidean space Z of points 
z= (zm, °+:, 2n), it is convenient to introduce an m-dimensional Euclidean 
space Y of points y = (y:, --* , ym). Furthermore, y(S) will denote the point 
in Y with the coordinates 


y(S) = | frac 


and z(S) the point in Z with the coordinates 


z(S) = [ ode 


Let c be the point in Y with the coordinates (c;, --- , Cm). Then the quantity 


(2.1) to be maximized may be written ¢(z(S)), and the side conditions as y(S) = ¢ 
and 2(S) ¢ A. 


Call M the range of the (m + n)-dimensional vector measure with components 
(3.1) yi(S), ca ym(S), z(S), a zn(S), 


that is, M is the set of all points with coordinates (3.1) generated by the totality 
of Borel sets S in X. Then M is a set in the (m + n)-dimensional space Y X Z. 
By the Lyapunov theorem [3] M is closed, bounded, and convex. We shall call 





GENERALIZATION OF FUNDAMENTAL LEMMA 215 


M, the set in Z which is the cross section of M by y = c, that is, M, is the set 
of all points z for which there exists an S with z(S) = z, y(S) = c. The projec- 
tion of M on the space Y will be denoted by N, so N is the set of points y for 
which there exists an S with y(S) = y. Thus the side conditions y(S) = ¢ can 
be satisfied if and only if ce N. 

From the Lyapunov theorem N is a closed, bounded, and convex set in Y. 
Let x be the smallest-dimensional linear space containing N. In the following it 
is crucial whether the given point ¢ is an inner point or boundary point of NV 
with respect to the topology not of Y but of x. A point y of N is called an inner 
point of N if there exists an m-dimensional neighborhood U of y such that 
UnxcC N, otherwise it is called a boundary point. 

If ¢ is a boundary point the inequalities of the Neyman-Pearson lemma and 
its generalizations have to be considered in a certain subset X, of X determined 
by the following definitions. Regard the points in Y as vectors and denote the 
inner product of two vectors § = (&,-°-:, &m) andy = (y1,°°:, Ym) by 
t-y = Buys +--+ + Emm. Then (2, --- , #’) is called a maximal set of vectors 
relative to a boundary point c if 


(3.2) .t' £0 
(3.3) ‘y sé -c for all y ¢ N, 
for all y ¢ N for which &*-y = £'-c, 
(3.4) . t?-¢ 


i= | Ce l;p = 2, °:-,f. 


A maximal set (é', --- , &) is called a complete maximal set relative to c if 
(', --- , &, &*") maximal relative to c implies ¢~' is a linear combination of 
t', --- , &. The existence of a complete maximal set relative to every boundary 
point c is shown by Dantzig and Wald ([{1], Lemma 3.1). The set X, is now 


defined as X if c is an inner point, while if ¢ is a boundary point it is defined as 
the subset of X in which 


(3.5) D> tifdz) = 0 (i = 1, vr r), 
j=l 


where (', --- , #’) isa complete maximal set relative to c, and &* = (£], --- , &5). 
If D is the domain of definition of ¢(z) and ¢(z) has a differential at z° = 


(24 9 teoug 2.) ¢ D, then by definition there exist cohstants* a; , °**, @, such that 
forze D 


(3.6) ¢(z) — o(2°) = & aie, — 24) + o(||z — 2°]I), 


3 If furthermore 2 is an interior point of D, then 8¢/dz; exist at z° and equal a; (i =1,... , 
n). In the converse direction, if #¢/8z; exist in a neighborhood of 2° and are continuous at 2°, 
then ¢(z) has a differential at z°. However, in the applications below 2° may be a boundary 
point of D. 





216 HERMAN CHERNOFF AND HENRY SCHEFFE 


where 


n 1/2 
z—2 || = | (z; — “| ji 


t=] 


If S° is a set such that ¢(z) has a differential at z° = z(S°), and if a;, --- , dn 
denote the constants in the differential at z° as in (3.6), then S° will be said to 
satisfy the condition © if there exist constants k, , --- , km such that 


>. aigdx) = D> kj ffzx) ae. in X.n S’, 
t-_1 j=l 


(3.7) 
> aglx) S$ Dk f,(x) ae.inX,— S*. 


t=1 j=l 

It should be noted that in general the set S° must first be known before the 
constants a; can be evaluated. This makes the problem of constructing sets S 
satisfying the condition @ and the side conditions inherently more difficult in 
the general case than in the special case (2.3), since in the general case the 
coefficients a; in the condition @ are functions of the set S to be determined, 
while in the special case there is only one a; which is always unity. Further 
attention is given to the problem of construction and computation of a maxi- 
mizing S in Section 9. 

About the condition @ we remark also that if c is a boundary point, X, will 
frequently be a set of measure zero, in which case the condition @ is vacuous. 
In this case it may be shown ({1], Lemma 3.2) that the set S satisfying the 
side conditions y(S) = c is unique up to a set of measure zero. 


4. Results of Dantzig and Wald. These concern the special case (2.3). They 
include an existence theorem which we shall not need (it is covered by our 
Theorem 5.1) and the following theorem which we shall. Here we write 
g(x) = g(x). The set X, is defined in connection with (3.5). 

TueoreM 4.1 (Dantzig and Wald). If S° is a set satisfying y(S°) = c, then a 


° eas ¥0 8 , ° ° 
necessary and sufficient condition that S’ maximize [ g(s) dx subject to the condi- 
8 


tion y(S) = c is that there exist constants k, , --- , km such that 


g(x) = 2 ki f(x) ae.inX.nS, 


(4.1) : 
g(x) S Dk fz) ac. in X, — S. 


t=1 


5. Existence theorem. For our method of proof of the existence theorem to 
succeed it is essential that the set A be closed. It may nevertheless be possible 
to use the theorem in situations where the given A is not closed, by applying it 
to a closed set A; containing A and then arguing that the maximum cannot 
occur in A; — A; an example is given in Section 8. 





GENERALIZATION OF FUNDAMENTAL LEMMA 217 


THEOREM 5.1. Jf there exists a set S satisfying the side conditions y(S) = c and 
2(S) e¢ A, if o(z) is continuous in M. n A, and if A is closed,* then there exists a 
set S° maximizing $(z(S)) subject to the conditions y(S) = c and 2(S) € A. 

Proor. Since M is closed and bounded, so is M, , and therefore M,n A. Also 
M.n A is nonempty because there exists an S satisfying y(S) = c and 2(S) e A. 
Since ¢(z) is continuous in the nonempty closed bounded set M.n A, there exists 
a point z° ¢ M,n A such that 


¢(z°) = sup ¢(z) forze M.nA. 


Now 2° ¢ M, implies the existence of an S° with z(S°) = z° and y(S°) = c. For 
any other S satisfying y(S) = c and 2(S) e A we have z(S) e M, n A, hence 
$(2(S)) S$ o(2°) = o(2(S°)). 


6. Necessary conditions. Suppose ¢(z) takes on its maximum value in 1. n A 
at 2’ = 2z(S°). The hypotheses of the following theorem imply that z° is an 
interior (in the topology of the n-dimensional space Z) point of A. This will of 
course be the case if A is open, and in particular if A = Z. On the other hand 
it is easily seen that z° must be a boundary (same topology) point of M.n A, 
unless all the constants a; (see equation (3.6)) in the differential of ¢(z) at 2° 
vanish. An S° for which all a; = 0 at z’ = 2(S°) will always satisfy the condition 
e (with all k; = 0). 

Tueorem 6.1. If S° is a set for which z(S°) is an interior point’ of A, if o(z) is 
defined in M.n A and has a differential at z = z(S°), then a necessary condition 
that S° maximize $(z(S)) subject to the conditions y(S) = c and z(S) € A is that 
S° satisfy the condition @. 

Proor. Assume S’ satisfies y(S) = c and 2(S) e A, and maximizes ¢(z(S)) sub- 
ject to these conditions. Let z° = 2(S°) = (21, ---, 2%), let a; be the constants 
in the differential of ¢(z) at z° as in (3.6), and define 


d(z) = a + 2d a2, 
a = ¢(2") — Said, 
t=1 


d(2(S)) = ay + 2 a;2:(S) = a + Da; [ g: dx, 
= 8s 


(6.1) $(z(S)) = a + [ (x a.9.) dx. 


tml 


* A hypothesis of Theorem 5.1 is that there exists a set S' satisfying the side conditions. 
Let z' = z(S'). The hypothesis that A is closed may be replaced by the sometimes useful 
weaker hypothesis that the set {z| z e (M.n A) and ¢(z) 2 ¢(z')} is closed. 

5 The proof shows that this hypothesis may be replaced by the weaker one that 2 = 2(S°) 
is a limit point of L n A for every line L in Z through 2’. 


‘ 
| 





218 HERMAN CHERNOFF AND HENRY SCHEFFE 


It will suffice to prove that S° maximizes ¢(z(S)) subject to y(S) = c. If this is 
true we can apply the necessary condition of Theorem 4.1 to g(x) = >> 2a agi(x), 
and this necessary condition becomes our condition @. 

Suppose then that S’ does not maximize ¢(z(S)) subject to y(S) = c. Then 
there exists an S' with y(S') = ¢ and ¢(z(S')) > ¢(z(S°)). We note that z’ = 
z(S') is in M. but not necessarily in A, and that z' ¥ 2° since ¢(z') > (2°). 

0 


Let p = ||z’ — 2°||, and hp = ¢(z’) — ¢(z°), so that h > 0. Write 

2’ = (1 — A)e° + dz! (0S)A <1). 
Then all z ¢ M, since M, is convex. Because ¢(z) is a linear function of z with 
d(z') = d(2°) + hp, it follows that 
(6.2) d(z‘) = d(z") + Arp. 

From (3.6) we have forze Men A 
$(2) = 4(2) + o(\|z — 2°|I), 

and hence if ze M.n A, 

o(2') = d(2’) + o(d). 
Thus there exists a 6 > 0 such that 0 < Ap< dandz ¢ M.nA imply 

l[o(2) — d(2)]/Qp) | < h, 

and so ° 

(2) > d(z’) — Mp. 
From this, (6.2), and ¢(z°) = $(z°), We get 
(6.3) $(2') > 92’) 
if 0 << 6/pandz e M.n A. Recalling that z’ is an interior point of A, we 
see there is a \’,0 < \’ < 8/p, such that 2” ¢ A. Also 2’ e M., so 2 € M.n A, 
and (6.3) is true for \ = ’. But 2’ e M,n A also implies that there exists an 
S’ with y(S*’) = c, 2(S”’) = ¢ A. For this S*’ we have ¢(2(S*’)) > $(z(S°)), 


so S° does not maximize ¢(z(S)) subject to y(S) = ¢ and 2(S) e A. This is a 
contradiction, and hence S° maximizes ¢(z(S)) subject to y(S) = c. 


7. Sufficient conditions. It is convenient to introduce a weakened form of the 
property of concavity of a function, which we shall call quasi-concavity; re- 
lated concepts have been considered by de Finetti [2]. A function ¢(z) defined 
in a convex set D is said to be concave in D if z° ¢ D,z' ¢ D,z = (1 — A)2? + dz’, 
and 0 < X S 1 imply 


p(2*) = (1 — voz") + AG(z"). 


If D is open and convex, and ¢(z) has continuous second partial derivatives in 
D, then a necessary and sufficient condition for ¢(z) to be concave in D is that 
the n X n matrix 


(7.1) (8°o/d2,02;) 





GENERALIZATION OF FUNDAMENTAL LEMMA 219 


be nonpositive in D, that is, all the characteristic roots be nonpositive in D. 
We shall say $(z) is guasi-concave in a convex set D if there exists a real dif- 
ferentiable function ¥() on an interval J containing the range ¢(D) of ¢(z), 
with 0 < y'(@) < + for ¢e J, and such that ¥(¢(z)) is concave in D. We 
note that concavity implies quasi-concavity (take ¥(¢) = ¢), but not conversely 
(for example, with n = 2 consider ¢(z) = 222 in the set D where z; > 0, z. > 0, 
and take ¥(¢) = log 9). 

TueoreM 7.1. If the set S° satisfies the side conditions y(S) = c and 2(S) « A, 
if o(z) is defined and quasi-concave in a convex set containing M, n A and has a 
differential at z = z(S°), then a sufficient condition that S° maximize $(2(S)) sub- 
ject to y(S) = cand z(S) e A is that S° satisfy the condition ©. 

Proor. Suppose first that ¢(z) is concave instead of merely quasi-concave in 
a convex set D D> M,n A, that the other hypotheses of the theorem are satis- 
fied by ¢(z) and S°, and that S° satisfies the condition @. Write z° = 2z(S°) and 
define the linear function ¢(z) as in the proof of Theorem 6.1. Then S° maxi- 
mizes ¢(2(S)) subject to the condition y(S) = c, since the condition C now 
becomes the sufficient condition of Theorem 4.1 applied to ¢(z(S)) in the form 
(6.1). 

Next we note that ¢(z) S ¢(z) in D. Assume the contrary, that there exists 
a point z' e D with b = ¢(z') — d(z') > 0, soz ¥ 2° since ¢(z") = d(z’). If 2 = 
(1 — v)z° + Az'(0 S A S 1), then 2 « D. Define (z’) = (1 — A)o(z2") + AG(2’). 
Then ¢(z’) = 4(z’) since $(z) is concave. But ¢(z‘) = $(z) — db, and hence, 
since ¢(z) has a differential at 2°, o(z‘) < d(z*) + Xb = G(z’) for A sufficiently 
small but positive. This contradicts ¢(z‘) = $(z*). 

If now S is any set satisfying y(S) = cand 2z(S) «A, thenz(S)eM.n A C D, 
hence $(z(S)) S ¢(z(S)) S 4(z(S°)) = $(z(S*)), the second inequality because 
S° maximizes ¢(z(S)) subject to y(S) = c. The theorem is now proved in the 
case where ¢(z) is concave. 

Suppose next ¢(z) is quasi-concave in D. By definition there exists a differ- 
entiable function ¥(#) on an interval J containing ¢(D), such that 0 < y’(¢) < 
+o for ¢ «TJ, and $(z) = ¥(¢(z)) is concave in D. Since ¥(¢) is a strictly in- 
creasing function on J, a set S° maximizes ¢(z(S)) subject to y(S) = ¢ and 
2(S) ¢ A if and only if it maximizes (z(S)) subject’ to the same side conditions. 
Since &(z) is concave in D we may apply the above result to (z) after we verify 
that &(z) has a differential at z? = 2z(S°). But this is the case since ¢(z) has a 
differential at 2° and y’(¢) exists at ¢ = $(z°). Let y = W’(o(z")). Then the con- 
stants a; in the differential of @(z) at z° are equal to 7 times those for ¢(z) at 2’. 
The factor y can be absorbed into the constants k,, --- , km of the condition 
e since 0 < y < +. 

The following corollary may be useful in applications where it is easier to 
prove that ¢(z) is suitably dominated by a quasi-concave function than that 
¢(z) is quasi-concave. 

Corouuary 7.1. If the set S° satisfies the side conditions y(S) = ¢ and z(S) e A, 
if U is a neighborhood in Z of 2° = 2(S°), if ¢(z) is defined in a sete D > Uu (M.n 


a nee pk MEMOIR A ROIS EN SEER I! 


AA AAI SLE NORECDEI NE 





220 HERMAN CHERNOFF AND HENRY SCHEFFE 


A) and has a differential at z°, if ¢*(z) is defined and quasi-concave in a convex set 
D* D D, and if o(z) S $*(z) in D while $(z°) = $*(z°), then a sufficient condition 
that S° maximize $(z(S)) subject to y(S) = c and 2(S) € A is that S° satisfy the 
condition @. 

ProorF. The corollary will be an immediate consequence of applying Theorem 
7.1 to ¢*(z) instead of ¢(z), providing we can prove that under the hypotheses 
of the corollary ¢*(z) has a differential at z°, and that this is the same as the 
differential of ¢(z) at 2° (else a set of constants af , --- , a® different from a , 

- ,a, would appear in the condition @). 

Suppose ©*(z) = y(¢*(z)) is concave in D*, where 0 < ¥() < + forge 
I D> ¢*(D*). Let B(z) = Yi@(z)). Since ¥(¢) has a single-valued differentiable 
inverse y~ on the interval J = y(J), and ¢*(z) = y"(@*(z)), o(z) = ¥'(@(z)), 
it will suffice to prove that @*(z) has a differential at z° and that this is the same 
as the differential of &(z) at z°. Since @*(z) is concaye it has a plane of support 
w = &(z) at 2’, that is, there exists a linear function @(z) such that @*(z) < (z) 
for z ¢ D* and $*(z°) = (z°). We observe next that this plane is identical with 
the tangent plane w = &(z) to the surface w = ®(z) at z’. For, suppose the 
contrary. Then because (z) and #(z) are both linear and @(z°) = (z°), there 
must exist a point z’ where ®(z') < &(z'). With z = (1 — dA)z° + dz’, Bz’) < 
4(z‘) for \ > 0. Therefore, since @(z) has a differential at 2°, &(z‘) > (z*) for 
\ sufficiently small and positive. But this implies the contradiction #(z‘) > 
#*(z‘). From the relation @(z) < ®*(z) < &(z) in D, the desired conclusion 


about the differential of @*(z) at z° easily follows. 


8. An example. We will illustrate our results by considering their application 
in the theory of Type D critical regions for testing simple hypotheses concerning 
several parameters. Type D regions were recently defined and studied by Isaacson 
[4]; they are locally optimum unbiased critical regions which are a generalization 
of the Type A regions of the Neyman-Pearson theory for the one-parameter 
case. 

Suppose X is the sample space and there exists a probability density p(x, @) 
for 6 = (@,°-+-, %&) in the parameter space 2. The hypothesis to be tested is 


is Hy) : 0 = 6°. We assume that for any set S in X the integral [ p dx has second 
8s 


partial derivatives with respect to 6; and 6;(i,7 = 1, --- , k) in a neighborhood 
of 6° which are continuous at 6°, and that it can be differentiated twice under 
the integral sign with respect to 6; and 6, at 6°. Denote by G(S) the symmetric 


matrix (/ sul) where gi; = [0°p/86,00;]o0 . Also write 
8 


(8.1) fi = [dp/00;]e0 (GG=1,---,k)m=k+1,fn = plz, 6). 


It is convenient to call a critical region S for testing Hp locally unbiased of size 
a if 





GENERALIZATION OF FUNDAMENTAL LEMMA 


[ trae = G= 


[ t dx =a,  G(S) is positive definite. 
8 


(8.2) 


If S is locally unbiased of size a the (generalized) Gaussian curvature of the 
power surface at @ = 6° is the determinant | G(S)|. A critical region S° is said 
to be of Type D if it maximizes | G(S)| subject to the condition that it be locally 
unbiased of size a. If S° is locally unbiased of size a, Isaacson obtained as a 
sufficient condition for S° to be of Type D the existence of constants k; , --- 
km» such that S° satisfies 


> 


= bis giz) 2 =X kif (x) a.e. in S°, 


(8.3) : 
>, bis giz) < <2 ki f(z) ae. in X — S$, 


where the matrix (b;;) of constants is the adjoint matrix of G(S°). 

To make the problem conform better to our previous notation we introduce 
an n-dimensional space Z of points z, with n = 4k(k + 1), and write the co- 
ordinates of z as 


(21, 212, ae » “1k » 2225 *** 5 Zak y Zaz, Se aa 
Define ¢(z) to be the determinant of the symmetric matrix (2;;), ¢(z) = 
\(z;;)| , where z;; = 2,;. With 2,,(S) = [ osar, we sec the problem is to maxi- 
8 


mize $(z(S)) subject to the side conditions (8.2). These may be written y(S) = 
c, where c = (0,--- , 0, a), and z(S) ¢ A, where A .is the part of Z where the 
matrix (z,;) is positive definite. Since ¢(z) is a polynomial in the coordinates of 
z it has a differential everywhere. If we write z° = z(S°) and a; = [0¢/02z,,],0 
(i Sj), we find a;; = b;; , ai; = 2b:;(¢ < 7), and so the condition @ for the pres- 
ent problem is Isaacson’s stated in connection with (8.3), except that the first. 
inequality of (8.3) is asserted a.e. in X, m S° and the second a.e. in X, — S°. 
However, it will be shown later that for all a # 0 or 1, c is an inner point as 
defined in Section 3, so that the set X, is the whole space X, and Isaacson’s 
condition is thus precisely the condition @ in this case. 

To apply our results we need to note that the set A is open and is contained 
in a closed set A; such that ¢(z) = 0 in A; — A. Let A,(z) with i = 1,---, 
2‘ — 1, denote the determinants of principal minors of the matrix (z;;); these 
polynomials in the coordinates of z are continuous functions of z. Since A is the 
set where all h,(z) > 0, A is open. Let A; be the set in Z where all h,(z) = 0; 
then A; is closed. In A; — A all h,(z) = 0, some h;(z) = 0. Thus (z;,) is posi- 
tive but not positive definite in A, — A, and hence its determinant ¢(z) = 0 
there. 

We shall prove first by application of our existence theorem 5.1 that if there 


RAAT AIA OILED BOE RD OS 





222 HERMAN CHERNOFF AND HENRY SCHEFFE 


exists any locally unbiased critical region of size a there exists one of Type D. 
Suppose then that there exists a critical region S' satisfying the side conditions 
(8.2). Then ¢(z(S')) = | G(S')| > 0. Theorem 5.1 tells us there exists a solu- 
tion S° to the modified problem of maximizing ¢(z(S)) subject to y(S) = ¢ and 
2(S) ¢ A,. For the solution S° we must have z(S°) ¢ A, else z(S°) « A; — A, 
¢(2(S°)) = 0 < $(2(S')). Thus S° maximizes $(z(S)) subject to y(S) = ¢ and 
2(S) eA. 

That any critical region of Type D necessarily satisfies the condition @ fol- 
lows immediately from Theorem 6.1. 

That the condition @ is sufficient for a locally unbiased critical region S° of 
size a to be of Type D may be deduced from Theorem 7.1. All the hypotheses 
of this theorem will be seen to be satisfied if we show that the set A is convex 
and the function log ¢(z) is concave in A. Suppose then that ¢° and ¢' are any 
two points of A. It will suffice to prove that ¢ = (1 — A)¢® + df’ is in A and 
that 


(8.4) log o(¢*) = (1 — A) log o(s°) + A log 9(¢") 


for all \(0 < A S 1). Let 2; be the coordinates of ¢*. Then the matrices (¢7;) 
are positive definite for r = 0, 1. There thus exists a real nonsingular matrix H 
such that both matrices H’(¢¢;)H and H’(ti;)H are diagonal, say H’(¢i,;)H = 
D', where D” is a diagonal matrix with positive diagonal elements dj, --- 
dj(r = 0, 1). Now 


, 


(ri) = (L — AEE) + AC), 


and so (¢;) = K’D*K, where K = H™, and D’ is a diagonal matrix with ith 
diagonal element equal to (1 — dt + Ad}. Hence D* is positive, and so is 
(5); thus ¢ is in A. Furthermore, 


log o(¢) = 2 log | K | + log| D*|, 
so to prove (8.4) it is enough to verify that 
log | D*| = (1 — d) log| D°| + A log| D'|, 
or that 


k k k 
> log (1 — A) d? + Adi] = (1 — A) D log dt + A D log dh. 
t=1 t=1 tml 


But this follows from the concavity of the function log z. 
We shall conclude by proving that c is an inner point of N in this and similar 
statistical problems with side conditions of the form 


[ (2, #) ae (0 <a < 1), 


0 : 
| 2. [ v, 6) az, 0 (i -_ Ry or sh 





GENERALIZATION OF FUNDAMENTAL LEMMA 223 


if the integral [ ve, 6) dx can be differentiated once under the integral sign 
8 


for all (Borel) sets S at 6 = 6°. With the notation (8.1), and y,(S) = | saz, 
8 


N is the set of all y(S) in the m-dimensional space Y with m = k + 1. We ob- 
serve that the set N is symmetrical with respect to the point (0, --- , 0, 3), that 
is if y = (y1,°** , Ym-1, Ym) is in N so is (—y1,-*+ , —Ymi, 1 — ym). For 
any y ¢ N there exists an S such that y(S) = y. The point y(X — S) is sym- 
metrically placed with respect to (0, --- , 0, 3), since fori = 1,---,m — 1, 


( 
yi(S) +. y(X dims S) cs laa p(x, 6) dx + 5 p(x, 6) az} |, = 0, 


while ym(S) + ym(X — S) = 1. On taking S to be the empty set we find the 
point y° = (0, --- , 0, 0) in N; by symmetry N contains y' = (0, --- , 0, 1), 
and by convexity the line segment L joining y° and y' and containing c = (0, 
--+ , 0, a). Since 0 < a@ < 1, c is an inner point of the line segment L. From 
this and the symmetry of N it may be argued geometrically that c is an inner 
point of N, but we shall give an analytic proof instead. 

We shall suppose now that c is a boundary point of the convex body N and 
from this derive a contradiction. There exists a linear function h(y) = hiys + 
-++ + haym not identically zero in N such that y = c maximizes h(y) for y in 
N. The maximum value of h(y) is thus haa, and hence h(y') = hm S haa and 
h(y°) = 0 S haa; therefore h,, = 0. Since zero is the maximum of h(y) for y 
in N and h(y) does not vanish identically in N, there exists a set S in the space 
X such that h(y(S)) < 0. But 


m—1 m—1 
h(y(S)) = LS hiydS) = —D hkeydX — S) = —h(y(X — S)). 
t=] tml 


Thus A(y(X — S)) > 0. But A(y) S 0 for y in N. This is the desired contradic- 
tion. 


9. Remarks on computation of a solution. We have mentioned that the problem 
of construction of a solution is much more difficult here than in the special 
case covered by the Neyman-Pearson fundamental lemma. We now sketch a 
general approach which perhaps might be modified and expanded to a method 
of numerical computation if desired. The basic idea is that the condition @ 
reduces the search for a minimizing set among all Borel sets to that for a mini- 
mum of a function of n + m real variables or an equivalent problem. 

Denote the (n + m)-dimensional vector (a, , +++, @n, Ki, °**, Km) by v = 
(v,,-** , Unam), and by S(v) the set {z| Dhiagd(x) = SOR «i f(x)}. With 
y(S) and 2(S) defined as before, let Y(v) = y(S(v)), Z(v) = 2(S(v)). If o(z) has 
a differential at z = Z(v), denote the differential coefficients there by #,(v), --- , 
®,(v). Let 6(z, A) be a continuous function of z which is nonnegative and vanishes 
if and only if z ¢ A (this implies A is closed): an example is the Euclidean dis- 


; 
: 
i 
i 
t 
$ 
: 
3 
) 





224 HERMAN CHERNOFF AND HENRY SCHEFFE 


tance from z to A. We now define three functions of v: 


D(v) = 8(Z(v), A), 
E(v) 7 [Y,(v) — e,]’, 
j=1 
where Y,(v) are the components of Y(v), and 


Fv) = D5 (ev) — vd. 


t= 


The function F(v) is defined only for v such that ¢(z) has a differential at z = 
Z(v). 

We next make the following simplifying assumptions (which would be lightened 
if the sketch of our method were expanded): 

(i). The conditions of our existence theorem, Theorem 5.1, are satisfied. 

(ii). X, = X, that is, c is an inner point of N (see Section 3). 


(iii). The cot {2 | > a:gd{x) — > qffx) = of n(X — X°), 


| tml j=l 
where 


x? = {x | gx (x) m= eee Jn (x) = fi(z) mM ec = fm (x) = 0}, 
has measure zero for all vectors v with the components a , --+ , a, not all zero. 


° . . - 0 vO. « . . . 
(iv). For any solution S°, z’? = z(S°) is an interior point of A, and ¢(z) has a 
nonzero differential at z’. 


(v). @(z) is defined and quasi-concave in a convex set containing A n M,. 
Under assumption (i) a solution of course exists, and under this set of as- 
sumptions it is easy to see from Theorems 6.1 and 7.1 that a necessary and 
sufficient condition for S° to be a solution is that, up to a set of measure zero 


70 v0 
and a subset of X°, S' = 


1, °** , VU, zero, and 


v/,.0 : . 
S(v°), where v° is a vector v with not all components 


D(v’) = E(v®) = F(v’) = 0. 

The problem has now been reduced to finding a vector v with 1, , --- , v, not 
all zero satisfying D(v) = E(v) = F(v) = 0. This problem can be formulated 
in various equivalent ways; one is to minimize D + E + F. 

An inelegant aspect of the above approach is that if v’ is a solution of the 
computational problem, then for any positive A, S(Av’) = S(v°) but dv° does 
not satisfy F(Av°) = 0 unless \ = 1, that is, Av’ is no longer a solution of the 
computational problem but S(Av°) is still the same solution of our actual varia- 
tional problem. This situation arises from our having required the components 
v(i = 1, --+, mn) to be equal to #;(v), when it is sufficient that they be propor- 
tional with a positive constant of proportionality. Such a proportionality holds 
if and only if the function 


n 


( n 4 n 
F(v) = >. ae (eer } — DY 4 O(v) 
j=l } i=! 


t=1 





GENERALIZATION OF FUNDAMENTAL LEMMA 225 


vanishes. The inelegancy could thus be removed by replacing F(v) in the above 
discussion by F(v). The solution of the computational problem could then be 
normalized by adding one of the conditions 


wim 


YLe=1 or YDwX=l. 
1 


t=1 
REFERENCES 

[1] G. B. Danrzic anp A. Wap, “On the fundamental lemma of Neyman and Pearson,” 
Annals of Math. Stat., Vol. 22 (1951), pp. 87-93. 

(2) B. pe Finertt, ‘‘Sulle stratificazioni convesse,’”? Ann. Mat. Pura Appl. Ser. 4, Vol. 30 
(1949), pp. 173-183. 

[3] P. R. Hatmos, ‘‘The range of a vector measure,’’ Bull. Am. Math. Soc., Vol. 54 (1948), 
pp. 416-421. 

[4] S. L. Isaacson, ‘‘On the theory of unbiased tests of simple statistical hypotheses specify- 
ing the values of two or more parameters,’’ Annals of Math. Stat., Vol. 22 (1951), 
pp. 217-234. 

[5] J. NeyMAn anp E. 8. Pearson, “‘On the problem of the most efficient tests of statistical 
hypotheses,”’ Philos. Trans. Roy. Soc. London Ser. A, Vol. 231 (1933), pp. 289-337. 

(6) J. Nevman anv E. S. Pearson, ‘‘Contributions to the theory of testing statistical 
hypotheses,” Stat. Res. Memoirs, Vol. 1 (1936), pp. 1-37. 





MAXIMUM LIKELIHOOD ESTIMATION IN TRUNCATED SAMPLES! 


By Max HAtpeErin? 
USAF School of Aviation Medicine 


1. Summary. In this paper we consider the problem of estimation of parame- 
ters from a sample in which only the first r (of n) ordered observations are known. 
If r = [qn], 0 < q < 1, it is shown under mild regularity conditions, for the 
case of one parameter, that estimation of @ by maximum likelihood is best in 
the sense that 6, the maximum likelihood estimate of @, is 

(a) consistent, 

(b) asymptotically normally distributed, 

(ec) of minimum variance for large samples. 
A general expression for the variance of the asymptotic distribution of 6 is ob- 
tained and small sample estimation is considered for some special choices of 
frequency function. Results for two or more parameters and their proofs are 
indicated and a possible extension of these results to more general truncation is 
suggested. 


2. Introduction. We suppose we are sampling from a univariate population 
governed by a probability law, f(z, 0), —» <a < #, where @ is a single pa- 
rameter. Our sampling process is assumed to be such that for any sample size, n, 
we have as sample observations only 2, %2,--- , z,, the r smallest observa- 
tions in the sample where r is defined for every n by r = [gn]. The notation [a] 
has the usual meaning of the largest integer contained in a. It is assumed that 
q is known and 0 < qg < 1. Such a sampling process as defined above could 
easily arise in an experiment of the life-testing variety. 

As a case in point, consider the testing of airplane propeller assemblies in a 
wind tunnel. The assemblies are quite expensive, costing several thousand dol- 
lars each. Furthermore, the test, which consists of increasing the wind velocity 
in the tunnel and observing the velocity at which each assembly is ruptured is 
of the destructive type. That is, if an assembly fails, it is not repairable, while 
if it does not fail, its function is not impaired. Thus, on the basis of budget 
limitations for testing purposes, it may be desirable to limit the number of as- 
semblies that fail. An obvious solution to this problem is to terminate the test- 
ing procedure after a fixed percentage of the propellers in the sample fail. The 
percentage would be fixed in advance so as to keep the total monetary loss 
within budgetary restrictions. Supposing that the velocity required to rupture 
a propeller is a random variable following a continuous probability law, we have 
a simple example of the type of truncated sampling process described above. 


' Part of a doctoral thesis presented to the Department of Mathematical Statistics, Uni- 
versity of North Carolina. Work on this problem was begun while the author was with 
the Rand Corporation. 


? Now at National Heart Institute, Bethesda, Maryland. 
226 





ESTIMATION IN TRUNCATED SAMPLES 227 


The sampling process we have described may be generalized to the case of 
several parameters and further to the case of several points of truncation, each 
of the latter being defined as a particular sample percentage point. We do not 
consider these generalizations in detail in this discussion. 

To obtain our results we shall need the following assumptions on f(z, 6). Not 
all the assumptions are needed in some of our further discussions, but are listed 
here for brevity and easy reference. 

AssumPTION A. For almost all x, the derivatives 


(2.1) dlog f(x,6) Flog f(x,6) J log f(x, 6) 
” 06 , 06: , ye ’ 
exist for every 6 belonging to a nondegenerate interval R. 
AssuMPTION B. For every 6 in R we have, 


| df(2,6) af (x,0) 


| 
| < Fila), 36? 


] 
| < F,(z), 
| d'f(x,@) 


398 < H(z), 


< F;(z), aah 


where F(x), F2(x), F3(x) are integrable over (— ~, ~), while [ H(x)f(a, 0) dx < 


M, where M is independent of 0. 
AssumPTION C. For every 6 in R 


» 2 a 2 
~~ dlog f(x,6) Sasi ( a f(x,0) ) 
= [ (PG) seo wv : a igs 


is greater than zero. Here, if % is the true value of @, A is defined by 


x 
q= [ f(x, %) dx. That is, X is the population 100q percentage point. 


AssumpTIon D. f(x, 6) is continuous in the neighborhood of x = and has a 
continuous derivative in x, f’(x, 0), while 


dlog f(x,é) log f(x,) log f(z,6) 
tt ae, ee 
are continuous tn the neighborhood of x = x. 

Finally, we define regular estimation from a joint frequency function, say 
h(a, , +++, 2, 6), in a manner completely analogous to that of Cramér, ([1], 
p. 479). That is, we suppose we can transform 2, --- , 2, to new variables 6*, 
Ai, °**, Art, (where 6* estimates 6), in a one-to-one manner so that 


r r—1 
(2.2) Ala, --- 2,30) [] dx; = g(6*; 6)m(a1, «++ , +1; 6*, 0) IT da; do*, 
t=1 t=1 
where g(6*; 6) is the density of the estimate 6*, while m(A; , --- , A--1; 6*, 6) is 
the conditional density of \, , --- , Ar-1, given 6*. Then, if dh/d0, g/30, dm/d0 


em tet Otte re hides EARP OSE 8 





228 MAX HALPERIN 


exist for every @ in R and if 


| ah og | dm 

l—— | < Aol, +++ ,2r < G)(6* — |< Mdm, -** ; Ana; *), 
Fp | < Hele std, | 38) < Gor), |) < Mm . 
where Hy , Go , 6*Gp , and M, are integrable over the whole space of (x, , --- , 2;), 
6*, 0*, and \,, --+ , Ar-1, respectively, we shall say we are in a regular estima- 
tion case of the continuous type and 6* will be called a regular estimate of @. 


3. Derivation of results. Since our problem is of prominence in the field of 
life-testing, it is convenient to use a terminology which stems from this con- 
nection. It may be remarked that though it is then implied that our random 
variable is nonnegative the latter point is in no way critical to our proofs. 

Thus, let f(z, @),0 < x < , bea probability density satisfying Assumptions 
A-D. We suppose that n individuals, each subject to f(z, @) as a death law, 
have been observed from age zero until r( = [gn]) of the group have died at 
times 7 ,%2,°°',%,0S%S%S5::: S2,< ~. If we denote the sampling 
density of x,,--- , 2, by h(a,,--+ , 2-), we clearly have 


(3.1) Me. +-: 2) = Ise, 6)[p(2,, 4)” 


= — a0 


where p(z,, 0) = 1 — q(z,, 0) = [ f(x, 6) dx. If we further denote the con- 


ditional joint density of 2, ---, 21, given z,, by A(m,--- , t-1; 2) and 
denote the density of x, in a sample of n by S,(2,), we have 


h(x, ne cae Zs) = h(x 9 °°? g Denk s Lr) Sn( Xr) 


3.2) 1 
‘ =(r—1)!]] [Meet | {” *) Hae, 0 A)[p(x,, 0)]"“[g(a,, 0) 
i=] q(ar, 6) 


We have now, denoting by the symbol F the operation of taking an expected 
value, the following lemma 
Lemma 1. If Assumptions A and B hold, 


8 log h(m, --- = : - log h(a, -°- =| 
3. E : =—-E ae —— Be 
33) | a0 ae? 


Proor. The proof consists of verifying that under Assumptions A-D 


(3.3.1) Sa! *) TT az, -/ a '%) TT az, = 0, 
Er 


Er t=1 


i=l 


and then proceeding exactly as in Cramér ({1], p. 502). Here Z, is the domain of 
i, °** Bee 

In order for (3.3.1) to hold we must have | dh/00 | < Ho(m,-:-, 2), 
| a°h/ae’| < Hy(a,, ++, 2,), where Hy and H, are integrable over E,. We have 





ESTIMATION IN TRUNCATED SAMPLES 


g) (ai, 6) 


a6 [p(z,, 6)" 


Is (x;, @)[p(z,, 0)" [eee , ®), 


(n—r—I1)!c 00 


oh 
|< G@—>) w—E sao) Fi(x;) + 


Hye,0 [ F(x) dx 


(n — 1)! 

from Assumption B. We also know that df(z, 6)/06 exists for all @ in some in- 

terval. Thus we may choose a 4 in that interval and assert for all @in the interval 
f(x, 0) < f(x, 0) + Fi(x)d = F(z), 


where d is the length of the continuity interval on @. We then have 


ah 
a8 : es = IT, Fla,)Fi(x) 


4 I F,(2;) [ F(x) dx = Him, --+ , 2,). 
(n—r—Il1)!t 

It is clear that Ho(x,, --- , x-) as just defined is integrable over E,. A similar 

discussion holds for 3°h/d6" and the lemma follows. 

Next we prove the following lemma. 

LemMMA 2. Let 6* = 6*(x,,--- , 2) be an unbiased estimate of 0, 6* being con- 
tinuous and possessing partial derivatives 06*/dx;(j = 1, 2,--- , r) in almost‘all 
points (a1, -°-- , 2). If estimation from h(x, , --- , z,) ts regular, we have asymp- 
totically 


(3.4) nE(e* — 0)? = 7 


d 2 r 2 
oe 8 log f(x, 6) | l of (x, @) | 
K - | E 6 Ls 4 f(x, 6) ax +4} | TO az | . 


Proor. Consider 
[ g(0*; 0) do* = [ -- [ itiaice += Shadi ¢ Ta. ~ 8 


where g(@*; 6) and m(\,, --+ , Ara; 6*, 6) are as difined in Section 2, in the 
definition of a regular estimation case. Under our regularity assumptions on 
m(Ay,*** , Apa 3 6*, 6) and g(6*; 0), these integrals may be differentiated with 
respect to @ under the integral signs. Thus we have 


- 2he/ ao rm 
Ces) cena 


2 ~ r—1 
- |  f (2108 7) midis +++ 5X25 6%, 0) TL ads = 0. 
L 90 — 2 t=1 


where 


(3.4.1) 


pn 0 ESE EIT Te ER OR Ee 





230 MAX HALPERIN 
Now, referring back to (2.2), we take logarithms of both sides of that relation- 
ship, neglecting differentials, differentiate and have 


0 log h(x, +--+ ,2,) _ 8 log g(6*; @) 4 8 log m 
060 06 06 


Squaring both sides of (3.4.2) and taking expected values of each side, we get 


[ (2s log ey h(x, uae Il bic an [ (2 log a) g(0*; 6) de* 
t=1 7 


JE, 00 06 


2 x a 2 r—l 
(3.4.3) + | (a; 6) [  f (=) m {| da, do* 
«“ H— 00 — 00 i=] 
o 2 
>/[ (2 as s) g(0*; 0) do*. 


The cross-product terms, resulting from squaring, all vanish by (3.4.1). Since, 
under our assumptions, we can write (ef. [1], p. 475), for @* unbiased, 


2 2 —1 
(3.4.4) E(6* — 6) | [ (2 = ‘) g(6*; 0) aot | 


it follows that we have exactly 
2\-1 
— ere 


or by (3.3) 


(3.4.2) 


0 log h(a, --+ , 2) et 
tina) 


From (3.1) we can calculate (d* log h(a; , --- , ,))/(06°) in detail, and integrating 
out 2%, °°: , 2-1, we get, after some manipulation, 


1p & log h(x, oem » Xr) ee r q(x, ly 6) 
0 


On? Bante 1) dz, 


oe : S,-(z,) dx, 


ei n—-1 [ dq(x,, 6) |° . 
(3.4.6) + rot | o S,-2(z,) dz, 


+? [ If pre "(2s fe wel *) f (x, 8) az | Sn—1(te-1) dt-—1 


ae 


oe * a log f(z,, 8) S,(2,) dx 
n Jo a aa e 


In (3.4.6) 
Sn—e(t,—a) 


(n — ec)! 


ae r—d—] n—c—r+d 
"oa =i)! oe =e d)! [q(x,-a, )] [p(z--«, 8)] S(ar-a, 9). 





ESTIMATION IN TRUNCATED SAMPLES 231 


That is, S,_-(z,-a) is simply the sampling likelihood of the (r — d)th smallest 
order statistic in a sample of size (n — c). With this understanding the basis for 
the right-hand side of (3.4.6) is readily apparent. If we consider the last term 
on the right-hand side of (3.4.6), we see that 


1| [°0 log f(z,, 8) 1 fr | a log f (x, 8) 
- a? * S, r. r a — ’ 
n| Jo 0g? ,) de, | S Vnbo | 0g? TG) ds 


which is O(1/+/n), since from our assumptions the integral exists. Hence, asymp- 
totically we can disregard such a term. Now we consider the integrands of the 
remaining terms of (3.4.6) and on the integrand containing S,_-(z,;-a), we per- 
form the transformation 


where a = (+/qp/f(A, )). It follows from Cramér [1], pp. 367-9, that the func- 
tions (a/+/n — c)Sn-(A + ay/+/n — c) each converge to 1/+/2x exp (—4y’) 
in any finite y interval and are each uniformly bounded in any such inter- 
val. Denoting the function associated with S,(A + (a/+/n— cy) by 
Gnd + (a/+/n — c)y), it is apparent that we can expand g,_. in series about 
y = 0 to zero-order terms plus a remainder, and consequently that 


(3.4.8) lim gn—e ( - Wes v) = g(d), say, 


no 


for any fixed y. Furthermore, it is clear that each g,_. is uniformly bounded in 
every y interval, finite or infinite. Thus the general relation desired is that 


(3.4.9) tim [” S.(u)on) dy = = [exp (-¥y) av, 


where we know that g,(y) and S,(y) converge to g(A) and 1/+/2x exp (—}4y’) 
respectively, for any fixed y, and that g,(y) is absolutely bounded by a constant, 
G, while S,(y) is uniformly bounded in any finite y interval. To establish (3.4.9) 
we choose a 7% > 0 such that, for any preassigned e > 0 

1 [ ‘ . 
—= exp (—3y) dy = 1 - 725 - 
V 28 \v| Svo p(— iy) dy 6(G + |g|) 


We can also write 


7 1 2 
[ g.ily)S.(ydy — sal gexp (—}y’) dy,| 


| g | 
< | .(y) Sn = alien a 
< Tens? (y)Sa(y) dy v= exp (—}y’) dy 


lg\ . 2 
8 ht: Say) dy + V2 Jiyi>ye exp (—4y) dy. 





232 MAX HALPERIN 


Since for | y | S yo, gn(y)Sn(y) converges everywhere to g/+/2zx exp (—4y’) and 
since gn(y)S,(y) is uniformly bounded in this interval, it follows that we can 
choose an no such that for n > no, 


| g 2 

f n(y)Saly) dy — = exp (—}y) d | < he. 

| meme” a 2m Jiv)\ <vo . ” rd 
We also have by construction 


ig\ L 2s - lgle € 
Von fis exp (—4v) dy = Ge + gp <6 


Finally, we have 


G Sr dy = Gi1-—- Sn ay | 0, 
ss v) . [ ih Svo ) , 


! ” ” 
and for | y | S yo, we can choose an no such that for n > ng 


1 
S,(y) dy > — - | ex (— y’) d —~ an? 

2 . V 2m Jiv) <vo . ay , 6G 

so that 
€ € € 

ee Se 
6(G + |g] ) +a 3 
Thus, choosing no = max (no , no), we can assert that for any preassigned ¢ > 0, 
we can find an m such that forn > nm, 


G | S,(y) dy < a1 ~i1+ 
\v| S¥o 


| f° ‘ fe re. | 
i gnly) Say) dy — an r exp (—4y’) dy) <6. 


Thus (3.4.9) holds. Then taking the limit of (3.4.6) and simplifying, (3.4) follows. 
We are now ready to prove the following theorem. 
THEOREM 1. The likelihood equation 


8 log h(a, --+ , 2) —~ 8 log f(z,, 6) 8 log p(z,, 6) 

ey SE eee a > Lee — 7) COS Ptr,” = 0 
oe a0 ee + aoe 
corresponding to (3.1) has a root 6 which 

(1) converges in probability (i.p.) to the true value of 6; 

(2) is asymptotically normally distributed; 

(3) ts asymptotically efficient. 

Proor. First we show 6 converges i.p. to the true value of @, 6 , say. We can 
write 

1 Q log A(z, gti es Lr) ae a(2 log ") (6 = 0) (* log ") 

(3.5.1) ™ 06 n\ 06 /6 n 08? / 6, 


+ 4A(@ — 6)°T (x1, ---, 2) = Bo + ( — &)B, + 440 — &)*B:. 


Here | A| < 1, the subscript 6 denotes evaluation at 6, and By, B,, Bz are 





ESTIMATION IN TRUNCATED SAMPLES 233 


functions of the random variables x, , --- , z,. We note that for the method of 
proof used here we must have 


1 | 3° logh 
Qn9 a -" eee 
(3.5.2) : oe < T (a, ’ ? Zr), 


where 
(3.5.3) ET(a,°°:,2r) < M, 


where / is a positive constant independent of 6. If we assume, in addition to 
Assumptions A—D, that 1/p(z,, @) is bounded independent of 6, say by I(z,), 
where E[I(x,)] < J, where J is independent of 6, (3.5.2) and (3.5.3) are easily 
seen to hold. The calculations are simple and are omitted. 

Now we consider the characteristic function, ¢o(t) of By . We have 


“ Sr ae ) 
do(t) = [ ex p| 4 \z! 8 log Liss 6) _ (™m-—r) af (zx, 6) az} 


lex € 60) 0 06 
(3.5.4) ; r 


r 
~ h(a, --+, 2) [I dz,, 
i=1 
or integrating on 2 ,°** , 2-1, 


a@® r—l1 n—r 
= | E (J : r») V ( , : r»)| S,(2,) dz,, 
/0 n n 


where 


Ir ] 
[ exp} jit = eae o f(x, 0) dx 
0 ; 06 
q(2r 5 8) 


fit 8 log f(x, , 6) | \ 


= exp< 
\ n 06 


/ 


(§ ) _f{ itdag(z,,0) 1 
W \-,2,}) = exp, —- —— ———~;. 
n an 06 =—s p(a,, 8) 
Now on the integrand of (3.5.4), we perform the transformation (3.4.7) and 
have 
a ay ay 
— r d 
Tale (ist Vall V G+ Za) 


[Ge Zl 80+ %): 


We want the limit for fixed y and ¢ of (3.5.5). From [1], pp. 367-69, we have 


(3.5.5) 


ay 


: 1 2 
lim —= S. 6 + a.) = Van oP (—4y) 





234 MAX HALPERIN 


for every fixed y. Also 


a log f (a +4 ) 
lim log V (é, A+ A = lim on : vn = 0 
oe Vi n wnaee 00 30 : 
for every fixed y and ¢, from the continuity of d log f(z, 6)/d0 about x = X. If 
we now consider the function U(t, \ + ay), we see that for every fixed y it is a 
characteristic function. Further (0U/dt),.» exists for every fixed y. Thus we can 
expand U(t, \ + ay) in a series in ¢ and have for values of ¢ near zero 


A+ay af (a, 6) it 
Ut,» + a) = 14 | a0 ga + ay, 6) 


(3.5.6) [ are af(z, 6) 8 log fie o\ toy af (x, 0) 
vs \*9 exp iA; See dx -| . ’ as ft 
0 06 ‘i 0 


0 
+ = a ~~ - ‘ 
q(y : ay, 6) 
where | A, | < 1 and the term in square brackets goes to zero with ¢. We can 
also write, for example 


i f(z, 9) | » (% wie. ) 
Jo 00 
qa 4 + ay) 7 q 





A+aAoy 
af(\ + adzy, 4) [ f(z, 6) d 
0 06 


ay Of(A + adzy, 6) _ 


+ q(y + ade y, 8) 00 qa + ade y, 0) 


where | A, | < 1. Then putting ¢/n and y/./n for ¢ and y respectively, we get 


[ af(zx, 8) d 


ft ay 0 00 - it pi(n, t, y) 
(3.5.7) l +—=)=1+ ~* aoe. 
nwvn q n n 


where p:(n, ¢, y) approaches zero for any fixed y and ¢, as n — . Similar con- 
siderations lead us to 


* af(a, 0) 
00 de it , po(n, t, y) 
(3.5.8) W (é. A+ ae) = 3 ate PAN, by) 


p n n 


where p2(n, t, y) approaches zero for any fixed y and tas n — ». It follows that 
(3.5.5) becomes, asymptotically, 


BY A \ 
af(ax, 8) Of(a, @) 1 ; 
3.5.9) ex I a -| =——-— dz> | —= exp (—}7’). 
(3.5 exp gE ; 30 dx : a6 wie 2Y 


Since (3.5.5) meets the conditions indicated for validity of (3.4.9), we can apply 





ESTIMATION IN TRUNCATED SAMPLES 


a convergence argument as indicated in Lemma 2 and conclude 


» 
(35.10) limedé = exp [ie I wo a - I “ss” az}, 


so that By converges i.p. to zero. 

Similar arguments lead us to 
(3.5.11) lim E exp (itB,) = exp (—K’it), 
and 
(3.5.12) lim E exp (itB,) = exp (M’it), 
so that B, converges i.p. to —K’ while B, converges i.p. to M’ < M, a positive 
constant independent of 6. The precise argument given in [1], pp. 502-3, may 
then be employed to show that (3.5) has a solution, 6, which converges in prob- 
ability to 4 . We omit these arguments. 

Now from (3.5.1), we have 


_s* (“a log s*) 
(3.5.13) K Vn(@ — %) = K Vn ae 
_B, as B (6 — 6) 
Kk? aR . 


The denominator of the right-hand side of (3.5.13) converges i.p. to 1, so that 
we may infer by well known theorems that the asymptotic distribution of the 
ratio is simply the asymptotic distribution of the numerator. Thus we need 
only show that (1/K+/n)(@ log h/86)¢, is asymptotically normal with zero mean 
and unit variance in order to complete the proof of our theorem. Denoting the 
characteristic function of (1/K+/n)(@ log h/d6)s, by $(t), we have, by virtue of 
(3.5.4) 


(3.5.14) o(t) = be (4). 


Applying the transformation (3.4.7) to (3.5.14) we can exactly as before show 
that V(t/K-~/n, \ + ay/~/n) converges to 1 for every fixed y and ¢, while 
(a/V/n)S,(\ + ay/+/n) converges to (1/+/2x) exp —4y’ for every fixed y. If 
now we turn our attention to U(t, \ + ay), we see that U can be expanded near 

= 0 in powers of ¢ to terms of order @ plus a remainder of order o(¢’), since 
the second moment of the distribution corresponding to U exists for every fixed 
y. Similar remarks apply to W(t, \ + ay). Thus, by manipulation of the type 
employed in obtaining an asymptotic representation of (3.5.5), we find that 
the similar result for the integrand of (3.5.14) is 


1 ' Of (x, @) 
see Hi — er 55 ae) 


d log i 6) 1 / [* af(a, @) aa 
+ [ (2 pera = ar) |: 


(3.5.15) 





236 MAX HALPERIN 
Since (3.5.14) meets the conditions indicated for validity of (3.4.9), we can for 


any fixed ¢, carry out a convergence argument as indicated in Lemma 2, to 
obtain 


1 o . 
a abe lial 
lim ¢(t) = \/On [exp ( it") 


; = oe [ af(a, 8) ) 
exp | 3\¥ KVJee [a dx dy 


Theorem 1 follows. 


(3.5.16) 


4. Generalizations. One can generalize the discussion of Section 3 to the case 
of several parameters and show that maximum likelihood estimation of @ = 
(0:,-°*, 9») from h(a, -+-, 2) is a best estimation procedure in the sense 
that 

(a) the maximum likelihood equations have a set of solutions (6, --- , 6) 
which are consistent; 

(b) +/n(6 — @) has a multivariate normal limit law; 

(c) the covariance matrix of the limiting distribution of +/n(6 — @) is the best 
matrix in the sense of Cramér [1]. 

In connection with (c) we mean specifically that the concentration ellipsoid 


corresponding to the covariance matrix of the limiting distribution of v/n(6 — 0) 
is identical with 


P : — 
(4.2) z E : log : igs (us = 0:)( uj Pe 0;) sat + 2. 
é, jan 00; 06; 


The ellipsoid (4.2) is shown by Cramér [1] to lie wholly within the ellipsoid 
corresponding to any set of regular unbiased estimates of 6, , --- , 0, . The mean- 
ing of ‘“‘regular’” here is precisely in the sense of Cramér [1] as applied to the 
joint frequency function h(x, ,--- , x-). The assumptions necessary to obtain 
the result are the natural analogues of Assumptions A~D. Thus A, B, D are 
extended by imposing similar conditions upon the various derivatives up to 
third order, that is those with respect to each 6; and also the mixed derivatives. 
The condition C becomes a requirement that the matrix with elements 


= . En) 0 bos £0) 


1 [ af(a, 0) Vf af (zx, @) ) ati 
= ect ads = 1,2,---,p, 
rT; ( [ ah hee oe P 


be positive definite. The additional assumption on p(z, , 6) specified in Theorem 
1 remains unchanged except that @ is taken to be a vector parameter. 

Under the assumptions outlined above the proof of (a), (b), (¢) follows the 
lines of Section 3. 

A direction of further generalization is to the case of several points of trunca- 





ESTIMATION IN TRUNCATED SAMPLES 237 


tion, each truncation point being a sample percentage point. This work has 
not been carried out in detail, but due to the asymptotic joint normality of 
sample percentage points, it appears clear that results of the nature of (a), (b), 
(c) would hold under conditions analogous to these given by Assumptions A-D. 


5. Small-sample estimation. For samples of the type considered in Section 3, 
one can obtain small-sample results for two important special choices of f(x, @). 
Case A. f(a, 0) = be, Os2< ~,0>0. 


For this case we have from (3.1) 


Sy 


From (3.4) it appears that in any regular estimation case an estimate 6* will be 
such that 


(5.1) A(ai,--+,2%) = z — o exp{ -0| 2, +(n—-r+ vz,]}. 


(5.1.1) E(o* — oy = < 


for 6* unbiased. From (5.1) we obtain the maximum likelihood estimate of @ as 
o P a r 
(5.1.2) = + tt >» say. 

“o —r+ih, * 


It is easy to show by calculating its moment generating function that the 
random variable, 26y, has a chi-square distribution with 2r degrees of freedom. 

It follows that 

4 ro , 6° 6° 
(5.1.3) Eé = ——. ~ 8, Var 6 = ~—., 
r—1 r—l nq 

Thus 6 is a best estimate in the sense of Theorem 1. 6 can, of course, be cor- 

rected for bias, the variance of the adjusted estimate then being 6°/(r — 2). 


Case B. f(x, po) = WE exp | - y eer), —o <zZ< ow, 


a 


This case is of marked interest since it is frequently assumed in life testing 
that the logarithm of time to death is normally distributed. Essentially this 
case has also been considered by Hald [3] and Cohen [5]. We indicate the solu- 
tion for completeness. It can be shown that 


(5.2) = Kx h+Vizz yp), w= a — hi, 


where 





238 MAX HALPERIN 


and h is the solution of 


os tone ae - 
= [ exp (—42°) dz 
V 2x Jn 

It is easy to show that the right and left sides of (5.2.1) are monotone de- 
creasing and increasing respectively. This implies the uniqueness of the solution 
and also affords a simple method of solving (5.2.1) with the aid of a table of 
ordinates and areas of the standardized normal distribution. Despite the fairly 
formidable appearance of (5.2.1) the solution goes quickly. 

It is also simple and interesting to calculate the asymptotic efficiency of the 
estimate of » from a truncated sample when o is known (the efficiency being 
considered relative to a completely known sample). For selected values of 
q = r/n, approximate efficiencies are given in thet following table. Approximate 
efficiency of x,{qn) , the sample percentage point, in estimating u is also given to 
indicate the extent to which one gains by using the other (r — 1) observations 
in the estimation procedure. 


ar_htvVve+V 


n-r v2 


q , iq 2 aa ‘ «A .6 Zz 8 9 
heft f 26: 63 66. .75 ..82 66.01. 96. 36 
Ef x4on) 8 2 FT «26 21 © @ -« 2 


Acknowledgment. The author is indebted to Dr. A. M. Mood who suggested 
this problem. 


REFERENCES 

{1] H. Cramér, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[2} H. Cramér, ‘“‘A contribution to the theory of statistical estimation,’’ Skandinavisk 
Aktuarietidskrift, Vol. 29 (1946), pp. 85-94. 

(3] A. Hap, ‘Maximum likelihood estimation of the parameters of a normal distribution 
which is truncated at a known point,’’ Skandinavisk Aktuarietidskrift, Vol. 32 
(1949), pp. 119-34. 

|[4] A. M. Moon, “An asymptotic distribution for a mortality problem,’’ Rand Corporation 
document RM-111, 1949, unpublished. 

[5] A. C. Conen, “Estimating the mean and variance of normal populations from singly 
truncated and doubly truncated samples,’”’ Annals of Math. Stat., Vol. 21 (1950), 
pp. 557-569. 





ON THE COMPARISON OF SEVERAL EXPERIMENTAL 
CATEGORIES WITH A CONTROL! 


By Epwarp PAULSON 
University of Washington 


Summary. This paper investigates certain statistical problems arising in the 
determination of the ‘‘best”’ of k categories when comparing & — 1 experimental 
categories with a standard or control. The discussion is limited to the case of 
a single stage sampling procedure with an equal number of observations on 
each of the k categories. Results both of an exact and of an approximate nature 
are obtained when (a) the observations with each category are normally dis- 
tributed, and (b) the observations with each category have a binomial dis- 
tribution. 


1. Introduction. In this paper we will be concerned with the problem of the 
selection of one of the k categories II, , Tz, --- , Tk as best when category II, 
plays a special role, since it represents the standard or control, while Il; , I; , 
-++ , I, represent k — 1 experimental categories. For the type of application 
we have in mind, the k categories might represent k varieties of wheat, or k 
drugs, or k machines; the “goodness” of a category will depend on some param- 
eter of the probability distribution associated with that category. The experi- 
mental categories can be classified into two groups: one group consisting of 
those categories which are superior to I, and a second group consisting of those 
experimental categories which are inferior to or at most equal to II, . In such a 
situation it will usually be desirable to have special protection against the 
selection of an experimental category as best when it actually is inferior to I, . 
This will be accomplished by requiring that the statistical procedure used pro- 
vide a special assurance that IT, will be selected as best if the second group 
happens to consist of all k — 1 experimental categories, that is, none of the 
experimental categories is superior to II, . Situations of this type are believed to 
be fairly common in experiments in medicine and agriculture. 

We will therefore consider the following statistical problem: given a sample 
consisting of kn independent observations {2,;;} (¢ = 1,2, +--+ ,k;7 = 1,2,-°-, 
n), where x;; is the jth observation with category II; , to devise a statistical 
procedure for selecting one out of the k categories as best so that if none of the 
experimental categories Il, , TI; , --- , MI, is actually “superior” to II, , then the 
probability that II, is selected will be 21 — a. We will also consider the related 
problem of deciding how large a sample will be required so that when one of 
the experimental categories is really superior to all the others including Il, by 
a specified amount the probability will also be 21 — 8 that this experimental 
category will be selected as best. The constants a and 8 might be considered as 


! Work on this paper done under the sponsorship of the office of Naval Research. 
239 


Sera TNES SERS aI aa eee 





240 EDWARD PAULSON 


roughly analogous to the type I and type II errors in the Neyman-Pearson 
theory of testing a hypothesis. However the present problem is of a multiple 
decision type, and is not equivalent to one involving the testing of a hypothesis 
unless k = 2, 

For the most part the discussion will be confined to the normal case, when 
the n observations with category II; (¢ = 1, 2,---, k) are assumed to be nor- 
mally and independently distributed with mean m; and common variance ¢ ; 
the best category is defined to be the one associated with the greatest value of 

A brief discussion will also be given of the binomial case, when each obser- 
vation with category II; is classified as a ‘‘success” or “failure’’ with a probability 
P, of being a ‘‘success’’; the best category is defined to be the one associated 
with the greatest value of P;. 


2. The normal case with known variance. We will first treat the problem when 
o is assumed to be known a priori. Let ; = aE in, €* = max (#2, Z; 


E,), a, = a/(k — 1), and for any a (0 < a < 1) let » be defined by the equation 


ee 


1 «e 
va! e” dt =a. 
Qr J», 


Let 11* be the experimental category whose mean is <*, and let \ be a constant 
whose value will be determined in a moment. The following statistical procedure 
is proposed for the selection of the best category: 


= Xo 4/2 , select II*; 


< do 2 , select TI,. 


We now complete the specification of the statistical procedure by determining X. 
It is obvious that when m, = max (m2, m3, --+ , m,), the greatest lower bound 
of the probability that * — # < Ao +~/2/n will occur when m, = m = --- = 
In order to evaluate P{#* — # < dov/2/n|m, = m = --- = m} we 
use the fact that * and #, are independent, and find, after some simplifications, 
/9 } 
P (Z* — 2, < ho Vn Mm, = M. = +++ = Me? 


(1) 


x 


. —}s' . . k—2 
= | . e at| dz dr 


“—o* 
Cl 


The constant \ will therefore be given as the root of the equation 


gre pe a 
(2) | . i edt] dr=1—a. 
‘Qe © V oT x 


Tv 





COMPARISON OF EXPERIMENTAL CATEGORIES 241 
From equation (2) it is possible to tabulate the values of \ as a function of a 
and k, Pending the construction of adequate tables, we will give an approximate 
method for finding \ with a definite limit of error. For this purpose, let A, stand 
for the event ; — #, = \o ~/2/n. We have P{z* — % < Ao V/2/n} = 1 
P(A, + A; + --- + A,), where the probabilities involved are to be calculated 
for the case when all the k means are equal. Making use of Bonferroni’s In- 
equality (see [1], p. 75) we have 


1—)> P(A) + SD P(d;-A) = Pi z* - 4% S ro 4 “> > > P(A.) 
i=2 a \ t i2n® 


Due to the symmetry when the means are equal, this becomes 


: = I)(k - 2) 


: (i 
1 — (k — 1)P(A,) + : 


P(Az: Aj) 


( 


7 
> P\#* — %, < do /? > 1 — (k — 1)P(As) 


\ 
Since (% — #,) and (#, — #,) have a bivariate normal distribution with correla 


: 
1 - —442 
P(A.) = | ola 
A 


tion = 4, we obtain 
V/ 20 


1 20 eo 
P(Az2 Az) = V3 [ [ ( its? stu) dx dy. 
T oO wd r 


If we use as the approximate value for \ the solution \ of the equation 


l « 
(4) \/2e . € wend = di 
(om Jr 


then from (3) it follows that P(\) = P{z#* — # < do V2/n} willexceed | a 
by an amount which is not greater than 3(k — 1)(k — 2)P(AsAs;). This quantity 
can be calculated from the tables of the volumes of the normal bivariate surface 
[2]. The calculations for several values of a and k are summarized in Table I. 
It appears that the approximation yields good results for values of a which 
ordinarily are of interest if k is not too large, say S 6. 

Any statistical procedure for selecting one of the k categories can, of course, 
lead to other types of error than that of selecting an experimental category as 
the best when it actually is inferior to the standard or control. In particular, 
the error in not selecting a particular experimental category as best when it 
actually is superior to all the other experimental categories and the standard or 
control by at least a specified amount is of considerable interest. For a fixed 
value of a, this new type of error can be reduced only by increasing n, the sample 
size. Suppose for convenience that II, is the particular experimental category 
that exceeds the others by an amount A; that is, m, = max (m,, me,---, 
m,1) + A. Using the statistical procedure of (1), it is easy to see that for a 


reer er ed 





i 
) 


242 EDWARD PAULSON 


fixed A, 4, k, and n the greatest lower bound of the probability that I, will be 
selected as best will occur when m, = m2. = --- = m_. = mand m = m+ A. 
TABLE I 
Limits for P(X) 
02 05 


\ = 2.326 dh = 1.960 


.981 = P(X) = .98 .955 = P(A) = 





P(X) .98 


If we denote this greatest lower bound by P(n; A, k, 4), we easily obtain 
P(n; X, k, 4) 
wal 


9 
(F, - 4 > ro ‘f and #, > max (#2, #3, +++ , 1) | m = m+ a 
n 


\ J 


\ 


2 1 w—AV/2 _agt 
- [ | Taz [ me at| 


w e hs? s e ge? k—3 \ e 4(w—(A/e)y )? 
{ [ (k — 2) | = at| 5 teaatiecontadaat 
oo oT J 


24 Lt Vv V2r 


co 1 w+(A/o)/n—-r\V/2 ie 1 wtl(A/oJ/n Satie P—2 jw? 
/ == 'f é " dt |- — | e* dt — dw. 
oLV2r sx V2 Lx 


V2 
In order to decide in advance of the experiment how large a sample should be 
taken, we might try to find n so that P(n; A, k, A) = 1 — 8 where a, 8, and A 
are determined by practical considerations depending on the particular experi- 
mental situation, and \ is found from (2) or (4). It will be very difficult to find 
n directly from (5) until tables are made available. However we can usually 
obtain a good approximation 7% to the required value by solving the equation 


¢ - 4 2 

a — e. Z ) 1 — 442 

(6) P< — tT & do m = m+ A? = — ee” dt = £. 
\ n ) Vor “(A/a)V nl2—rv 

The solution can be written # = (20°/A*)(A + vg)” which reduces to 2 = 

(20°/A*)(va, + vs)” when X is used for \. The adequacy of this approximation 

can be estimated with the help of the inequality” 


9) 
P< d, - Zy - Xo /?\ —_ (k _ 2)P\ x, < Z2} 
n 
/2) 


(7) ( 
< P {n;d,k, A} S Pi&® — HS Xe >, 
< in RA S \* i, = 7 H/ - 


* The writer is indebted to the referee for this inequality, which is an improvement over 
the one originally used 





COMPARISON OF EXPERIMENTAL CATEGORIES 243 


which holds when m; = mz = +--+ = m1, m = m, + A. To derive this in- 
equality, it is obvious from the definition of P{n; A, k, A) that P{n; A, k, A} Ss 
»'F, — & > do V/2/n}, while from Bonferroni’s Inequality we have 


( 9 k—1 
Pin;d,k,A} 2 1 — Pi %& — Hi So et DL Pla S ;} 
\ v) 


a=? 


= Piz, — #2 o /?}- (k ~ 2)P{® < 
\ n 


Hence when % is found from (6), it follows from (7) that [1 — P(#; A, k, A) 
will exceed 8 by an amount which is less than = 


k — 2) /* 
(k — 2)P{i S %} “ 


o 2 
e** dt. 
V2e Jcareyva/2 
We have attempted to indicate the adequacy of the approximation % found 
from (6) by computing the upper bound for [1 — P(#; A, k, A)| for several 


TABLE II 
Upper bound for [1 — P(i;X, k, A)] 


05 02 


. 2025 . 2008 
.05 .0502 -0500 





.20 .2031 .2010 
.05 .0501 .0500 


values of a, 8 and k; in these calculations the value of \ given in Table I was 
used for \. The calculations are summarized in Table IT. 

It appears that for 8 S .20,a@ S .05, the upper bound for [1 — P(#; A, k, A)| 
exceeds the corresponding 8 by a small amount which can be neglected for most 
practical purposes. 


3. The normal case with unknown variance. The case when o¢ is unknown will 
now be briefly considered. Let 


k 
P= >> 5 
8s = i: 
t=] a=1 ( 
° » @ ’ / _¢ ah 
be the pooled estimate of o based on n = k(n — 1) degrees of freedom. The 
statistical procedure in (1) is modified as follows: 


9 
select II* if #* — #, = Ax =; 
elect if z Tr ZnS A/ | 


select II, if Z* — 7, <A,8 /?. 
i 


(8) 





244 EDWARD PAULSON 


The exact equation that A, must satisfy in order to have P{Z#* — # < Ans 
V/2/n|m, = m, = +++ = m} = 1 — a@ and an explicit expression for P(n; 
d, , k, A/o), the probability that I, will be selected as best when m = m: = 
= m1, m/e = m/o + A/o, can easily be found by a procedure similar 
to that used for (2) and (5); however the results are complicated and instead 
we will proceed directly to discuss approximate procedures. Let C; denote the 
event £; — # > An8s W/2/n (j = 2,3, ---, k), and let ¢,, denote a random vari- 
able having the ¢ distribution with n’ degrees of freedom. With the aid of tables 
of the ¢ distribution an approximation \, to A, can be found so that P{# — #, > 
Aus V2/n} = Pit > An} = a/(k — 1) = a. Then the probability that Tl, 
will be selected as best when all the means are equal will exceed 1 — a@ by an 
amount which is less than 3(k — 1)(k — 2)P(C2-C;). For bounds on the second 
type of error, we have 
m mM 


/5 
Py — £,; => a8 N = Sei 


o 


Mi 


A\_ @, — 2)Pla, < & 
0) \ o 


< Pin r,k, abs Pim — 2 rus 4/2 ™ 

\ 0 
All these inequalities are easily obtained as in Section 2. To evaluate the bound 
for the first type of error, a good approximation can usually be found by regard- 
ing s/o to be normally distributed with mean 1 and variance 1/(2n’); using this 
approximation it is easy to verify that 


P(C2-C;) = P{U 2 Xn / Se ae ee i Fes}. 
, . c= Qn’ + r2, si 2n’ + r2, 


where (U, V) has a bivariate normal distribution with zero means, unit vari- 
ances and correlation p = (n’ + \*,)/(2n’ + \*). This same device might also 
be used to approximate the upper and lower bounds for P(n; A, , k, A/o) as an 
alternative to evaluating the bounds by using tables of the non-central ¢ dis- 
tribution. Finally, to obtain the value of n so that P(n; \,, k, A/c) = 1 — 8 
a good first approximation will usually be given by mo = (20°/A”) (va, + vs)’; 
after computing A;,,) and the corresponding upper and lower bounds for P(no; 
Xing), k, A/c), the first approximation no can be modified if necessary and the 
process iterated. 

It should be noted that in order to find the sample size required to control 
the second type of error, either an approximate value of ¢ must be known from 
past experience, or else it must be sufficient for the practical problem under con- 
sideration to know the probability of selecting the best experimental category 
as a function of the ratio of A to oc. It is possible to eliminate the dependence of 
the result on o by making use of a two-stage sampling scheme due to Stein 
[3]; this and other sequential procedures may be considered in another paper. 


4. The binomial case. In this section a brief treatment of the binomial case 
will be given, based on the use of the inverse sine transformation. That is, we 





COMPARISON OF EXPERIMENTAL CATEGORIES 245 


will use the fact that if p is the observed proportion of successes in n independent 
trials with a constant probability P of a success, then are sin +/ is for large n 
approximately normally distributed with mean (are sin 1/P) and variance 
1/(4n) (provided the angle is given in radian measure). This transformation 
was previously used by W. Allen Wallis and the present writer [4] to design 
experiments for comparing the percentages associated with one experimental 
and one standard category. The material in this section can be regarded as one 
possible extension of that work to the case where we are dealing with more 
than one experimental category. 

Let r; be the number of successes in the n observations with category I]; . 
Let p,; = (r;/n), let u; = are sin V Di and let ?; = the true probability of a success 
with category II;. Let p* = max (po, ps,--:, pe), U* = max (ue, U3, °°°, 
u,), and let II* be the experimental category with observed percentage of suc- 
cesses p*. If there should happen to be more than one category with the ob- 
served percentage of successes = p*, select [I* at random from the subset 
having p; = p*. 


We now propose the following statistical procedure for selecting one of the 


k categories. 
faite isi 1 
select II* if u* — wu, =X 5,3 
el 


1 
an ’ 
where \ is to be chosen so that if P; 2 max (P:, P;,--- , Px) the probability 
that II, is selected as best will be =>1 — a. We assume that n is large enough so 
that the set {~;} can be regarded as normally distributed with common variance 
1/(4n) and means are sin ~/P;. Therefore the problem is once again essentially 
equivalent to the normal case with known variance, which was treated in Sec- 
tion 2, and the value of \ is given by the solution of (2). To find the sample size 
nso thatif Py = Py = --- = Py. = Pand P, = P + 6 (6 > 0), the probability 
that II, is selected as best will equal 1 — 8, set A = are sin ~/P + 6 — are sin 
V/P, and the required n will be given by (5) when o+/2/n is replaced by +/1/2n. 
For values of a, 8, and k so that a S .05, 8 S .20,andk Ss 6, it has been shown 
in Section 2 that if we use approximate values of \ and n given by (4) and (6), 
the change in the probabilities considered will be small, and will ordinarily be 
of little practical importance. Using the notation of the last seetion, (4) and (6) 
for the binomial case are equivalent to ve, and n = (ve, + vg) /(2d°). 


(9) 


select I], if u* — uw, <A 


We conclude this section by discussing a specific problem. Consider a situa- 
tion in which we are interested in investigating the effect of three experimental 
treatments on a certain disease, where it is known from previous experience 
that the probability of survival with the standard treatment is of the order of 
magnitude of .75. The problem we wish to consider is that of designing a statisti- 
cal procedure (based on a single stage of sampling) for selecting one of the 
$ treatments which will have the following properties: (a) the probability of 


EF DIE TE 





246 EDWARD PAULSON 


selecting an experimental treatment as best when in fact it is inferior to the 
standard treatment is to be $.05; (b) if one of the experimental treatments 
should happen to increase the probability of survival to .90, while the prob- 
abilities of survival for the three other treatments is $.75, then the probability 
that the superior experimental treatment will be selected as best should be 
= .95. Upon setting a = .05, 8B = .05, and k = 4 we findA = 2.128, A = 
202, n = (2.128 + 1.645)?/(2A*) = 174. 

The required statistical procedure having properties (a) and (b) is the follow- 
ing. A group of 696 animals are all innoculated with the specific disease under 
consideration, and then the animals are subdivided in some random manner 
into 4 groups each consisting of 174 animals. The first of these groups is given 
the standard treatment, and the remaining groups each receive one of the 
experimental treatments. After the experiment is completed, if are sin ~/p* — 
are sin ~/p, = 2.128/+/2(174) = .114, we conclude that the experimental 
treatment with observed percentage of success = p* is best, otherwise we con- 
clude that the standard treatment is really better than any of the experimental 
treatments. 


REFERENCES 


{1} Witu1am FeLuer, An Introduction to Probability Theory and Its Applications, John 
Wiley and Sons, 1950. 

[2] K. Pearson, J'ables for Statisticians and Biometricians, Part II, Ist ed., Cambridge 
University Press, 1931. 

|3] CuHares Stern, ‘‘A two sample test for a linear hypothesis whose power is independent 


of the variance,’ Annals of Math. Stat., Vol. 16 (1945), pp. 253-258. 

[4] E. Pautson anp W. A. Watts, ‘Planning and analyzing experiments involving two 
percentages,”’ Selected Techniques of Statistical Analysis, McGraw-Hill Book 
Co., pp. 247-265. 





ON THE MOST ECONOMICAL SAMPLE SIZE FOR CONTROLLING 
THE MEAN OF A POPULATION 


By H. WrEILER 
New South Wales University of Technology, Sydney, Australia 


Summary. For quality control charts controlling the mean of a population 
either small samples may be taken out at frequent intervals or larger samples 
at less frequent intervals. In this paper, a simple formula is derived by which 
the most suitable sample size can be determined, leading to the detection of any 
given change of the population mean with a minimum of inspection. 


1. Introduction. Consider a normal variate x representing some measure of 
a mass-produced article, and suppose that control limits similar to those used 
in the preparation of control charts [1], [2] are to be determined to detect changes 
of the population mean of x. Such control limits may be placed on a chart similar 
to that used in control chart analysis. 

After the mean and standard deviation of a population have been estimated 
by means of an initial large sample, smaller samples of fixed size N are taken 
during the production, and their arithmetic means = )°x/N are calculated. 
A chart is then constructed with control limits m + 30/+/N, where m and o 
are the estimates of the population mean and S.D. obtained from the original 
large sample. The various values of # are then entered in the chart in chronologi- 
cal order, and as soon as one such value falls outside the control limits, produc- 
tion is stopped to allow investigation. 

The aim of this paper is to determine the most economical sample size, that 
is, that value of N which would indicate a change of the population mean after 
a minimum amount of inspection. It will be found that the most economical 
sample size depends on the amount by which the population mean has changed. 
Thus, if the population mean changes from m to m + ko, while ¢ remains con- 
stant, the most economical sample size NV = n will be a function of k&. In particu 
lar, it will be shown that this function is 


9. as 3.090 
120 when the control limits are m + ——= 
ke Vn 


30 


11.1 oS 
_ when the control limits are m + 
I Vin 


6.65 “ 2.080 

= — when the control limits are m + —, 
k? Vv 
2.330 


Vn 


nt 


when the control limits are 


247 


eee RT ere eT 





248 H. WEILER 


Tables are calculated, giving the value of n for other control limits and giving 
also the average amount of inspection in each case. 
Finally, a new chart, based on two sets of control limits, is discussed briefly. 


2. The average amount of inspection for a given N. Let m be the original mean 
and o the S.D. of the population, and let m + Ba/+/N be the control limits 
adopted for the arithmetic mean @ = }°2x/N of a sample of given size N. 


TABLE 1 

3.09 — 0.4./N P S(N) A(N) 
.690 0.0036 278 278 
.524 0.0058 173 345 
396 | 0.0083 | 121 362 

. 290 0.0110 | 91 364 
195 0.0141 | 71 355 
.890 0.0294 306 
.490 0.0681 235 
.090 | 0.1379 182 
.690 2451 147 

49 290 .3859 127 

j 118 
116 Min 
44 117 


w 
reg 


ome — NO NO WD bd bo 
eer 


Noes 


75 375 


81 ol 


-6460 
6950 


0 
0 
.110 0.5438 
0 
0 


—— 


If the population mean changes from 4 = m to np = m + ko (k > 0), the 
probability that Z exceeds the upper control limit m + Bo/+V/N is (assuming 
that o remains unchanged) 


Bo 
P= P(z =m+t+ p=m+ke 


VN 
P\e ko = o h | +1 ) 
= ZT—--m~— ke € =< =m+kK 
I n Cc = V/N ou o 


I—m — ko ean _— 

= P( ae =B-—kVNiup=mt ke) = P(z 2 B — kVN), 
a/VN | 

where z is the standardized normal variate (mean zero and 8.D. one). 

Thus, when » becomes equal to m + ke, about 100P samples in every 100 
samples, or one in every 1/P samples will give a mean ¢ above the upper control 
limit. It follows that on the average S(N) = 1/P samples, or A(N) = N/P 
articles have to be tested before a change of the mean from m to m + ko can 
be expected to be revealed. 





CONTROLLING MEAN OF POPULATION 249 


3. Example. For illustration, take the example k = 0.4, B = 3.09. We obtain 
Table 1 for various values of N (using normal probability tables). 

The example shows the following int ing points: 

(a). Suppose that the population mean « 's by +0.4 standard deviations. 
If the chart is based on the customary sampie size used for control charts [3], 
namely, N = 4 or N = 5, about 360 articles must be tested before detection of 
the change can be expected. On the other hand, if the control chart is based 
on a sample size between 50 and 80 (say), about 120 items only are required to 
indicate the change. The usual engineering practice of using charts for small 
samples requires thus about three times as much inspection as would be required 
with a chart for a suitably large sample size. 

(b). Suppose that the population mean does not change. In that case Z will fall 
above the upper control limit about once in 1000 samples. This means that with 
sample size 4 a “false alarm” will be raised about once for every 4000 articles 
tested. With a sample size 75, on the other hand, a “false alarm” will be raised 
only onee for about 75,000 articles tested. 

The two points raised suggest that in certain cases it may be of advantage to 
deviate from the usual practice of using small sample control charts. A third 
argument in favor of large samples is that the control limits are based on the 
assumption that € is normally distributed. This assumption is usually satisfied 
with great accuracy when the sample is large, but may not be justified when the 
sample is small. 

The above arguments hold also for other values of k and B, and we may state 
that, unless special reasons exist for making the samples small, the sample size 
N should be chosen such that the average amount of inspection A(N) becomes 
a minimum. 


4. The minimum amount of inspection. We define the most economical sample 
size n as that value of N for which the average amount of inspection A(N) re- 
quired to detect a given change of the population mean becomes a minimum. 

If the standardized normal probability density is denoted by ¢g(z) = 


e ***/ 4/2, we have 
(2) A(N) = N/P =N / [ y(z) de. 
B—k/N 
Differentiating with regard to N, we have 
/ y(2) dz — $k+/N ¢o(B — k+/N) 
dA ie ik es 
1N < 
‘ | | o(2) ae| 
B—kJ/N 


The condition for A(N) to be a minimum is (dA/dN) yen = 0, which reduces to 


(3) 


(4) P(u) = [ ole) de = 3(B — u)p(u), 


where u = B — kv/n. 





250 H. WEILER 


Equation (4) is easily solved, using tables of ordinates and integrals of the 
normal distribution. To do this, we put the minimum condition (4) in the form 
_ 2P(u) _ 

e(u) 


The left side of this equation can then be calculated for any value of u and the 
values n, S(n), and A(n), can be deduced. We have 


(5) Q B— u. 


(6) 


1" . I 
(7) S(n) = EF 


n Q° 

(8) A(n) = p= pp 
The values of S(n), k’A(n), and k’n are shown in Table 2 for various values of u. 

It can be seen from this table that for every value B > 2.24 two values NV 
exist which satisfy the condition dA/dN = 0. Only the larger value of N, how- 
ever, corresponds to a minimum amount of inspection. (The smaller value of 
N corresponds to a maximum of A(N)). Thus, for B = 3 we find n, = 11.1/k 
and ng = 0.606/k’. The amounts of inspection corresponding to n; and ne are 
A(m) = 17.65/k* and A(nz) = 46/k’ respectively, which shows that samples 
of size ne would lead to a much larger amount of inspection than samples of 
size n,. The lower part of Table 2, corresponding to values of u greater than 
0.6, can therefore be ignored. 

Besides this, we notice that no values of u exist for which B is less than 2.24. 
This means that for such values of B no value of N exists which would make 
dA/dN equal to zero. This case will be discussed later (Section 6). 


5. Discussion of the special case B = 3.09. The values B = 3.09, B = 3, 
B = 2.58, B = 2.33, are of special interest because they correspond to the 
most frequently used control limits. In particular, we have for B = 3.09: 
n = 12.0/k’, S(n) = 1.55, A(n) = 18.6/k*. The values of n and A(n) for vari- 
ous values of k are tabulated below. 


k; 0.2 0.3 . ee 6 @.7f 0.8 O03 1.6 

n 300 133 D4 : 28 21 17 12 

A(n) 464 ) Hy 38 29 23 19 
Alternatively, we may plot n and A(n) as functions of k; they become straight 
lines when plotted on log log paper. 

When the most economical sample size is taken, the average number of 
samples required for the detection of a change in the population mean is the 
same for all values of k; it is equal to S(n) = 1.55 when B = 3.09. This form of 





CONTROLLING MEAN OF POPULATION 


TABLE 2 





0.692 0.352 
0.655 0.368 


0.6443 0.3725 


0.6293 


0.3778 : 
0.618 0.381 
0.579 0.391 
0.540 0.397 





es id 
.368 
.d02 
.333 


bo 


.312 
0.290 

.242 
0.194 
0.150 
0.111 
0.054 


25 
.26 
ol 
.39 
.48 
.59 


844 


to bo bo bo 


tw 


bo 


0.0339 ae 3.00 0.606 76 46 


control chart is thus very efficient, for it will indicate a change in about 2 out of 
3 samples, whereas it will raise a false alarm (or type I error [4]) only in about 
one out of 1000 samples. 





H. WEILER 


It appears from the above table that a control chart for small samples, say 
N between 4 and 10, is adequate only for the detection of changes of the mean 
greater than one standard deviation. 

There is, of course, no need to adhere rigidly to the sample size given by the 
table, for in most cases the exact change (if any) of the population mean would 
not be known beforehand, but the table will give valuable information regarding 
the approximate size of the sample required. 

Thus, in the case when B = 3.09, we would recommend the following sample 
sizes: 


Change of Mean u in S.D.’s Sample Size N 


0.2-0. 100-300 
0.3-0.- 70-150 
0.4-0. 50-— 80 
0.5-0.6 30- 60 
0.6-0. 25- 40 
0.7-0.8 20- 30 
0.8-1.0 10— 25 
To detect changes larger than one standard deviation, any convenient size 
up to 10 could be taken. 


6. The case when B < 2.24. When B < 2.24, the derivative dA/dN is different 
from zero for all values of NV. This means that A(N) has no relative minimum 
but increases with N for all values of NV. The average amount of inspection is 
then smallest when N = 1, but, for reasons stated in Section 3, it is usually not 
advisable to take N smaller than, say, 4. 

The only case of this type that might be of interest for the purpose of quality 
control is the case B = 1.96, because the control limits m + 1.96 ¢/+\/N are 
often used as so-called inner limits [5]. The average amount of inspection is 
then (Section 2) A(N) = N/P = N/P(z = u), where u = B — kV/N. This 
gives N'= (B — u)*/k’ and A(N) = (B — u)*/(k’P). Taking B = 1.96 and 
substituting different values for u, we obtain Table 3. 


The table shows clearly that the average amount of inspection A(\V) decreases 
with the size V of the sample, that is, the smaller we make N the more economical 
will be the test. 


7. A chart with two sets of control limits. When a chart with two sets of con- 
trol limits is used, it is usually set up as follows. 

After the mean m and 8.D. ¢ of the population have been reliably estimated, 
a suitable sample size N is chosen, and inner limits (m + 1.96 o/+/N) and outer 
limits (m + 3.09 ¢/+/N) are calculated and entered in the chart. Production 
is stopped as soon as one & value falls outside the outer control limits. The main 





CONTROLLING MEAN OF POPULATION 253 


purpose of the inner limits is to provide a first warning when a point falls out- 
side these limits. Production is then not interrupted, but samples are taken 
more frequently in order to reach a decision without delay. 

Now we have seen that small samples are most suitable for the inner limits 
while larger samples should be taken for the outer limits. It seems therefore 
indicated to construct a chart involving two sample sizes. (If a smaller sample 
size is used for the inner control limits, the terms “inner” and “outer” become 
misleading because the inner limits will then actually be wider than the outer 
limits.) To detect a change of the population mean from m to m + ke, a chart 
may be prepared as follows. (1) Take samples of size N = 4 or 5 (or any other 


TABLE 3 
(B = 1.96) 


1 
B-u Nk? = (B — u) A(N)k? S(N) = P 


.76 .0392 
.66 .0485 
.56 . 0594 
.46 .0721 
.36 .0869 
.26 .104 
.16 .123 
.06 .145 
.96 .168 
.76 . 224 
.56 .288 
.36 .359 
0.16 .436 
—0.04 .516 
—0.54 .705 


.O4 


a 


25. 
20. 
16 
13. 
11 
9. 
8 


.16 
25 


. 


nNOS 


49 
64 


oe 
o 


CORK DROaONNWHE UNOS 
o 


© 


1 
1 
] 
1 
1 
l 
1 
1 
0 


oo 
oO - wo © 
On 
o 
aooo = 


° 
~~ ee OCOCCO 
oo 


OO “I 
le ia — 


bo bo 

no 

ee DN Ww eS 
- OW OO Ul 


convenient small sample size) and construct a chart with control limits m + 
1.960/+/N. (2) Calculate a second set of control limits m + 3.090/+/n, based 
on a sample size n = AN, which is a multiple of N as close as possible to the 
most economical sample size 12.0/k*. (3) Calculate the means # = >2/N and 
X = >o2/). (4a) If a value X falls outside the limits m + 3.09¢/+/n, stop pro- 
duction and investigate; (4b) if a value 2 falls outside the limits m + 1.960/+/N, 
do not interrupt production but take out samples frequently to reach a decision. 

While it is true that the “‘inner’’ limits serve mainly to provide a first warning 
for a possible change of the population mean, they may be used also to reach a 
definite decision. If, for instance, two successive values of # fall above the upper 
inner limit, we may regard this as a significant indication for a change in the 





254 H. WEILER 


population mean, because the probability for this to happen when the popula- 
tion mean is unchanged is only (0.025)’ = 0.0006, which is less than the prob- 
ability that a single value X falls above the upper outer limit m + 3.09¢/+/n. 

Again, it is easily shown that the probability that 4 out of 16 successive X 
values fall above the upper inner limit is about the same as the probability that 
a single value falls above the upper outer limit. We are therefore justified in 
regarding such an occurrence as a significant indication for a change of the 
population mean. 


8. Conclusion. When a single set of control limits is used for a chart controlling 
the mean of a population, the above theory leads to much larger samples than 
those usually taken in industry. However, even with a chart of this type, small 
samples are not always uneconomical, for there are other factors to be considered 
which are not covered in this paper. 

For instance, it may be of advantage to divide samples into subgroups in 
order to detect changes due to definite anticipated causes, necessitating the use 
of smaller samples. 

Again, if a change of the population mean from m to m + ke is anticipated 
and the sample size is determined accordingly, any unsuspected larger change 
would in the average be detected later than if a smaller sample size had been 
used. 

On the other hand, if small changes of the population mean of a given order 
are anticipated and if it is unlikely that larger changes occur, the sample size 
should be calculated according to the above theory. 

When it is convenient to use a more elaborate chart, containing two sets of 
control limits, the theory leads to the customary small samples for the one set 
(m + 1.960/+/N) and to the above large samples for the other set (m + 3.09¢/ 
~/n). Any unexpected larger change of the population mean is then likely to 
be detected by means of the small samples. 


REFERENCES 


{1] W. A. SHewnart, The Economic Control of Quality of a Manufactured Product, D. Van 
Nostrand, 1931. 

[2] W. A. SHewnart, Contribution of Statistics to the Science of Engineering, University of 
Pennsylvania Press, 1940. 

(3] W. B. Rice, Control Charts in Factory Management, John Wiley and Sons, 1947. 

[4] J. Neyman anv E. S. Pearson, “Contributions to the theory of testing statistical 
hypotheses. I. Unbiassed critical regions of Type A and Type A,,” Stat. Res. 
Memoirs, Vol. 1 (1936), pp. 1-37. 

(5) B. P. Dupp1ne anp W. J. JENNeTT, Quality Control Charts, No. 600R, British Standards 
Institution, 1942. 

[6] American Society ror Testinc Matertats, ASTM Manual on Quality Control of Ma- 
terials, Philadelphia, 1951. 

[7] S. S. Wirks, “Determination of sample size for setting tolerance limits,’’ Annals of 
Math. Stat., Vol. 12 (1941), pp. 91-96. 





OPTIMUM ALLOCATION IN LINEAR REGRESSION THEORY 
By G. ELFvine 


University of Helsingfors and Cornell University 


Summary. If for the estimation of 8,, 82 different observations (“‘sources’’) of 
form (1.1) are potentially available, each of them being repeatable as many times 
as we please, the question arises which of them the experimenter should utilize, 
and in what proportions. With appropriate optimality conventions the answer is 
the following. For the estimation of a single quantity of form @ = a)8,; + a282 
the optimum allocation comprises two sources only; for the estimation of both 
parameters, the corresponding number is two or three; the best proportions are 
indicated in Sections 2 and 4 below. Generalizations to more than two parameters 
and to observations at different costs are briefly discussed. 

The problem is related to Hotelling’s weighing problem [2] and to the topics 
treated by David and Neyman in [1]. 


1. Introduction. Consider an experimenter who wants to determine two un- 
known quantities 8; , 8.. We assume that for this purpose a certain number r 
of different potential observations are at his disposal, the outcomes of which 
are of form 


(1.1) Yi = Labi + Liebe + ni (= b**s 99s 


here Xa , Zig denote known coefficients and n; a random variable (the error term) 
with mean zero and standard deviation c. (If the standard deviations of the 
different observations are proportional to known numbers fk, , --- , k, , we have 
only to divide the equations (1.1) by these numbers in order to restore the situa- 
tion of the text.) We assume, furthermore, that the experimenter may perform 
each of the observations as many times as he pleases, or not at all, all actual 
observations being uncorrelated. If he has decided upon a certain total n of actual 
observations, he is faced with the problem which of the potential ones should be 
performed, and in what number. As an application, the reader may think of a 
surveyor who wants to find the coordinates of a point by observing the direction 
to it from given surrounding points of known position; in this case the regression 
is, of course, only differentially linear. 

In order to distinguish between the potential and the actual observations, 
we will in the following refer to the former as sources (of information), to the 
latter as observations. Since a source is essentially described by the coefficient 
vector x; = (2%, Zi2), we will also briefly speak of the source x; or simply the 
source 7. In the solution of any particular optimum allocation problem, those 
sources which are actually utilized will be called relevant, the others irrelevant. 

The following normalization and idealization of our problem is mathematically 
convenient. Let the required number of observations on the ith source be 

255 





256 G. ELFVING 


n; = np; ; the p,’s are obviously multiples of 1/n fulfilling the conditions 
(1.2) p20, dp,=1. 


The mean of the observations on the 7th source then has a regression equation 
which may be written 


i — 
(1.3) Yi = Li By + Lie Be + Vp v= i: er ye?) 


where 9; has variance o'/n. If a certain p,; is zero, the corresponding equation 
has to be left out of the system. For large n, the p,;’s may be varied practically 
continuously over the range (1.2). Idealizing this feature we get a large-sample 
problem, which is essentially independent of o and n. For simplicity we may 
finally assume o°/n = 1; in order to restore full generality we have only to re- 
introduce this factor in all variance and covariance formulas below. We are then 
faced with the following question: 

Consider a planned set of observations of form (1.3) with E(q:) = 0, D’(4:) = 1, 
and with the weights p; at our disposal, subject only to the conditions (1.2). What 
are the optimal p,’s? 

The solution of this problem obviously presupposes a specification of the 
word “optimal,” that is, a specification of the estimation problem at hand. 

When applying the solution to practical problems, one has to remember that 
our large-sample p,’s must be approximated by multiples of 1/n; here, a fine- 
structure study might be necessary. 


2. Estimation of a single quantity. In this section we shall deal with the case 
where the interest of the experimenter is centered upon a particular linear com- 
bination of the parameters, say 
(2.1) d= a,8; + AeBe ‘ 

Particularly we may have @ = 6, or 6 = Be. 

Consider all linear forms t = > c.g; yielding unbiased estimates of the quantity 

(2.1). For this purpose the c;’s have to make the equality 

E(t) = di ci(r ahi + 2282) = a;,8; + arB8e 
identical in 8; , Be, that is, they must satisfy the vector equation 
2.2) dex; = a. 
For r > 2 there are infinitely many sets {c;} fulfilling this condition. Among the 
corresponding estimates ¢ there is, for every fixed set of p;’s, one with least vari- 
ance; this statistic is, according to a theorem by Gauss, obtained by substituting 
in (2.1) the least-squares estimates 6, , 82 of the parameters, that is, the values 
6, , 8. that minimize the weighted square sum 
(2.3) > pig — tah; — Zabr)’. 


The variance of this ¢ is, of course, a function of pi, +--+ , pr. We want to find 
those weights p; which yield the smallest minimum variance. 





ALLOCATION IN REGRESSION THEORY 257 


The solution of this problem can be found by a simple geometric argument. 
For this purpose we first notice that the smallest minimum variance, by defini- 
tion, equals min, min, D*{ >-c.;}, the c,’s and p;’s being subject to the condi- 
tions (2.2) and (1.2). Inverting the order of minimization and calculating, to 
begin with, the minimum with respect to the p,’s for fixed c,’s, we easily find 


2 
(2.4) min D* | >oe9} = min >> “ . ki, 
P Pp . 
where k, = >>| ¢; |. The minimizing p: values are p.i = | ¢; | /ke. 
It remains to minimize (2.4) with respect to the c,’s, remembering the condi- 
tion (2.2) which we rewrite in the form 


a = k.Dopei-sgn ¢;-x; = ka, (say). 


Fig. 1 


The factor k, being a positive scalar, the sum on the right-hand side represents 
a vector a, with the same direction as a. The weights p,; being nonnegative with 
sum one, it is clear that the endpoint of this vector lies on or within the convex 
polygon II spanned by the vectors +x,,---, +x, (Fig. 1). Since by (2.5), 
k, is the length ratio of the vectors a and @, , it is obvious that (2.4) reaches its 
minimum when the endpoint of a, coincides with the intersection A* of the vector 
a (or its extension) with the polygon Il. If this point lies, for example, between 
the corners X; and X, of II (see Fig. 1), it is seen that the coefficients p,; in (2.5) 
—and hence the optimum weights—must be in the ratio A*X.: A*X, fori = 1, 2, 
and zero fori = 3, --- ,r. The smallest minimum variance is given by (OA/OA*)’. 
In terms of our original problem we may state this result as follows: 

For the estimation of a single quantity (2.1), two and only two of the sources 
(1.1) are relevant; they have to be used in the proportion shown by the geometric 
argument above. (There are obvious modifications of this statement in the cases 
where @ passes through a corner of II, or where three or more of the x,’s have 
their endpoints on the same line.) 


ere eee ese ee 





58 G. ELFVING 


If the vector of a certain source falls entirely inside the polygon spanned by 
the remaining vectors, this source is of no use for the estimation of any single 
quantity (2.1). 

The fact that, with optimum allocation, only two of the potential observa- 
tions are actually performed, has a somewhat surprising consequence. The 
square sum (2.3) reducing to two terms only, its absolute minimum, zero, is 
reached when 8; and #2 are chosen so as to make both terms vanish. The estimates 
3; , B. are thus obtained simply by dropping the error terms in (1.3) and solving 
for the parameters. 

Example. Consider the case where the potential observations are of form 


Y¥=atXP+ 7, at < 2g.% *** € Be, 


that is, the case of linear regression in the elementary sense. Here the polygon II 
is a parallellogram spanned by the vectors (1, X;), (1, X,), and their opposite 
vectors. If the interest of the experimenter is centered upon 8 alone, it is seen 
that he has to use only the extreme sources 1 and r; the observations on them 
have to be equal in number. If @ alone is to be estimated and if all X;’s have the 
same sign, the extreme ones should again be used, this time in proportion X,:X, . 
If the X,’s include both positive and negative numbers, then the values of the 
p.’s are arbitrary with the sole condition that the weighted average > pix: = 0. 
Since in practice the p,’s have to be multiples of 1/n, n being the number of ob- 
servations, it is usually impossible to arrange the p,’s so that the condition men- 
tioned is exactly fulfilled. In such circumstances, one can still make a useful 
choice between different approximations’). 

Generalization. The generalization to three parameters is obvious. The polygon 
TI is replaced by a convex polyhedron with triangular side-planes. In any estima- 
tion problem concerned with a single linear combination of the parameters 
there will in general be three relevant sources. For more than three parameters, 
the geometric rule must be replaced by an algebraic procedure. 


3. Estimation of both parameters. For a set of actual observations, that is, for 
fixed 7, , --- , p,, the least-squares technique yields minimum variance estimates 
of both parameters as well as of all linear combinations of them; nothing is 
gained in the accuracy of one estimate by giving up accuracy in another. In the 
present setup where the weights p; are variable, some information is needed 
concerning the desired relative accuracy of different estimates. A reasonable 
approach seems to be to choose an appropriate positive definite quadratic form 
in the estimation errors 8; — 8; , 8: — 82 and minimize its expectation by proper 
choice of the weights. By a linear transformation of the parameters (and a corre- 
sponding transformation of the coefficient vectors) the problem can always be 
reduced to the minimization of the particular form 


(3.1) q = E{(8; — Bi)” + (B — Be)"} = D*(B:) + D*(A2) 


with respect to pi,-°::, Dr- 


1 Cf. [1], p. 116. I am indebted to the referee for this and several other valuable remarks. 





ALLOCATION IN REGRESSION THEORY 259 


It is well known that the covariance matrix A of 8; , 3: is the inverse of the 
information matrix 


iz ‘La Ly 
m (=? l DP 1 "| -Zoain, 


(3.2) 


2 Pi Ti Liz 2; Pirie 


Our object is to minimize the trace 
g= Au tae =e + u" 


of this inverse with respect to the p,’s. A peculiar feature of this problem is that 
the minimum point usually will lie on the boundary of the region (1.2), some 
of the p,’s being actually equal to zero. 

Consider a point P = (p:, --+ , pr) in (1.2) in which g reaches its minimum. 
If 7 and 7 are two relevant sources, that is, if p; > 0, p; > 0, any differential 
variation 


dp; = —8, dp; = 6, dp, = 0 (h ¥ t, j) 


of the coordinates leads to another point in (1.2). Accordingly, in order for P 
to be a minimum point, we must have (d¢/dp; — dq/dp,) 4 = 0 for all 6, that is, 
we must have dq/dp; = 0q/dp, for any two relevant sources 7, j. If, on the other 
hand, 7 is a relevant and j an irrelevant source (i.e., p; > 0, p; = 0), then p; can 
be varied only in the positive direction, and we must, by the same argument as 
above, have (dq/dp; — 0q/dp,)5 = O for any positive 5; hence dq/dp; = dq/dp; . 
In conclusion: to any solution of our minimization problem there exists a constant 
—«* such that dg/ap; = —x«° for all relevant sources, whereas dqg/dp; = —«° 
for irrelevant sources. 

As far as the relevant sources are concerned, «’ is the ordinary Lagrange multi- 
plier. Since g is a homogeneous function of order —1 of p:, --- , p,, and since, 
by the above result, 


r aq Ce r re 
Psp, a XP cathe” 
we conclude from Euler’s identity that «° equals the minimum value of g. This 
also establishes the sign of «* as positive, as already anticipated in the notation. 
We shall now compute the derivatives 0q/dp; . This is easily done by differ- 
entiating the matrix identity MA = I with respect to p,(i = 1, --- , r). Since 
by (3.2) OM/dp; = x;x;, a short calculation gives 


a —Ax,x;A = —(Ax,)(Ax')’. 
Op; 


Hence 


. 0 OspA , \/ " 142 
(3.3) = = - = —sp {(Ax,)(Ax,)’} = —|| Ax; |/', 


where || Ax; |] denotes the length of the vector Ax;. We note that || Ax’||? is a 





Seite NL AOA ATES ain = AB 


260 G. ELFVING 


positive definite quadratic form in the components of x; hence, the equation 

Ax’ ||’ = Const. represents an ellipse centered at the origin. 

Combining the results of the three preceding paragraphs we have the follow- 
ing theorem. 

TuHeEorEM. To any set {p;} that minimizes the function (3.1) there corresponds an 
ellipse E., centered at the origin, such that all points x; representing relevant sources 
lie on E. and none of the points representing irrelevant sources lie outside of E. 

Since three points determine a conic centered at the origin, we conclude that, 
in general, there are at most three relevant sources. Even in the case where four or 
more source-points happen to lie on the same ellipse, and the rest inside it, it 
may be shown by a continuity argument that three relevant sources are enough 
for the minimization of q. 

Generalization. The preceding arguments apply to an arbitrary number s of 
parameters, the ellipse being replaced by an (s — 1)-dimensional ellipsoid or hy- 
perellipsoid in R, . Hence, there will be at most 3s(s + 1) relevant sources. How- 
ever, already for s = 3 the computation of the optimum allocation becomes 
rather complicated. 


4. Finding the weights. Simple examples show that the cases with two and 
with three relevant sources both actually occur. Assuming for the time being 
that we know how to pick these sources, we now want to find the weight dis- 
tribution between them as well as the minimum value of q. 

If there are two relevant sources corresponding, say, toi = 1, 2, we find the 
estimates 8, , 8. simply by solving the equations 9; = ra6; + tebe, i = 1, 2. 
Performing the solution, taking the variances, and introducing polar coordinates 
r, 6 for the vectors x; , X2 , we find 


24 4 nopt+rnp: 
(4.1) q = D*(é:) + D*(6.) = 2P1 1 P2 


. 2.68 é 
rio sin” (A. — @;) 


The minimum of this expression is attained for 


(4.2) Pi = fo/(ri + 12), pe = 1:/(r1 + 12), 


and the minimum value itself is 

for tr 
(4.3) Qmin = \dn (h = &) 

The case with three relevant sources 7 = 1, 2, 3 is somewhat more complicated. 
The derivatives (3.3) being less convenient for actual computation of the mini- 
mizing p; values, we replace gq by the homogeneous rational function L/M, 
where M is the determinant of the matrix (3.2) and L is the trace of M multiplied 
by pi + po + ps. On the set p: + pe + ps = 1 we obviously have gq = L/M. 
Instead of minimizing L/M on the set mentioned, we may find the required ratio 
Pi: p2:ps by minimizing L under the restriction M@ = Const. Introducing a 
Lagrange multiplier \ and differentiating L — \M we find the equations 

3 


(4.4) > (li; — Am)p; = 0 (i = 1, 2,3), 


y= 


\ 





ALLOCATION IN REGRESSION THEORY 


where 
(4.5) Ly; =ritr, mi = rir; sin’ (0; — 6;). 


If \ is any eigenvalue of the system (4.4) and P a corresponding eigenvector, a 
well known argument shows that d is the value of L/M in P. Hence, the minimum 
value of q on (1.2) is equal to the smallest eigenvalue of (4.4) for which all components 
of the eigenvector have same sign. The required optimum allocation is given by these 
components normalized to sum one. 


5. Selection of sources. There remains the question how to pick the relevant 
sources from a given set of potential observations. 

From the theorem in Section 3 it is immediately seen that a source is certainly 
irrelevant if it is represented by a point inside the convex polygon spanned by 
all x’s. A source 7 is, furthermore, irrelevant if for any two subscripts j, k (# 2), 
the ellipse passing through x; , x; , x, and centered at the origin leaves some 
fourth x, outside it. After discarding all such sources there might still be more 
than three left. It is in principle always possible to examine these “eligible 
sources” three by three, determine the corresponding minimum values of q ac- 
cording to Section 4, and pick the triplet with the smallest value. Most of the 
triplets will actually reduce to pairs, one of the three sources being irrelevant in 
combination with the others. The occurrence of this case is most easily detected 
by means of the following criterion, which we mention without proof: 

A source x; is irrelevant in combination with x, and x2 if and only if x; lies inside 
or on a certain ellipse, centered at the origin and passing through x, and x2, with 
parametric equation 


- as x, sin (t 7 6;) a Xe sin (t —_ 62) 
(5.1) es sin (6: — 6:) 





(0S tS 2r). 


In most practical situations, two sources picked by inspection will probably 
do without much loss of accuracy. 

Example. Take three sources with polar coordinates (r, 6), (r, —@), and (p, @) 
respectively. We consider the two first as fixed, the third as variable. The equation 
of the ellipse (5.1) becomes in rectangular coordinates ¢ tan’ @ + 7° cot’ @ = r’. 
When x; is inside this ellipse, the source 3 is irrelevant in combination with 
1 and 2. There are, on the other hand, regions in which x; ‘knocks out’ one of 
the other sources. Writing (5.1) explicitly and interchanging the subscripts 1 
(2) and 3 one finds, after some calculations, that source 1 (2) becomes irrelevant 
when x; moves outside the curve p | sin 2¢| = r| sin 2@| in the first or third 
(second or fourth) quadrant. As a result we have a cross-shaped figure: in the 
center only sources 1, 2 are relevant, in the angle-fields only 2, 3 or 1, 3; along the 
axes all three sources are relevant. 


6. Observations at different costs. The preceding theory can easily be 
adapted to the case where the potential observations are at different costs, say 
$, , +--+, v per unit. Let nm, , --- , n, be the number of times that the different 


Peer lg On ea 


ener ee 





262 G. ELFVING 


observations are repeated. If the total costs have to equal a prescribed amount 
C, we have the restriction >on; = C instead of }>n; = n. Dividing the regres- 
sion equations for the averaged observations 9; by +/v;(i = 1, --- , r) we get a 
new set of regression equations 


* 
6.1 gt = 246+ the +S 
( ) Y 1P1 2 Vp* 


where 
a * - J -_ * j nee * 
(6.2) yi = 9/V0;, TH = 2y/Vui, pi = vm/C, 


where 7; is a random variable with mean zero and standard deviation o/+/C, 
and where the weights p? are subject to the restrictions (1. 2). This is precisely 
the previous situation. One has only to enter the procedure with the modified 

* a *, . - e 
sources x; and to remember that the outcoming p;’s give the optimum allocation 
of the costs, not of the observations themselves. 


REFERENCES 
[1] F. N. Davip anp J. NeyMAN, ‘‘Extension of the Markoff theorem on least squares,” 
Stat. Res. Memoirs, Vol. 2 (1938), pp. 105-116. 


[2] H. Hore.uina, ‘Some improvements in weighing and other experimental techniques,” 
Annals of Math. Stat. Vol. 15 (1944), pp. 297-306. 





ON THE DISTRIBUTION OF TWO RANDOM MATRICES USED 
IN CLASSIFICATION PROCEDURES’ 


By Roseprru SrIrGREAVES 
Columbia University 


Summary. Two classification statistics discussed in the literature can be 
written as functions of the elements of a 2-2 symmetric random matrix M. An 
analytic derivation is given of the distribution of M, and of a related matrix M*, 
extending earlier work on distribution theory by Wald [1] and Anderson [2]. 


1. Introduction. A probiem of classification discussed by Wald [1] and Ander- 
son [2] may be described as follows. We have N; + Nz + 1 independent p-di- 
mensional chance vectors. We know that the first N, vectors are observations 
from a population 7m, , the following N2 are observations from a population 7 , 
and the last vector is an observation from a population x, where z is either 7, 
or m2. It is assumed that the probability distribution in both m, and 7-2 is multi- 
variate normal with the same covariance matrix 2; the vector of expected values 
is uw” in m and uw in m,. The values of uv, uv, and = are not known. Let X 
denote the p-(Ni + Nz + 1) matrix of observations. On the basis of X we want 
to classify the last observation, Xw,4»,4: as coming from m, or 7 ; that is, we 
want to make one of the two decisions, x = m or r = m. 

When the parameter values are known, the class of Bayes solutions is easily 
found, resulting in pairs of classification regions of the form 


(1) 26h and rT > &, 
where 
(2) [* a Xnemnz (s™ a yp”) = Ay? + a” Yo*G™ a yp”). 


Both Wald and Anderson propose, therefore, the use of classification statistics 
derived from (2) by substituting estimates for the unknown parameter values. 
Wald considers principally the statistic 


(3) OF wm Se te —- ¥) 


where 


Nite 


Ni 
RX =(1/N,) > X,, XX = (1/N:) 
tel 


Zt, 


rk 
taaN y+1 


and 
S = (1/(Ni + N2 —2)) 
Ni NitN 
; |= (X,— XxX, — Fy + > (X, — X°)(X, - er) ; 
t= tN 1+1 


! Sponsored by the Office of Naval Research. 
263 


REPRE ESS wee te are er nes 


stil 





264 ROSEDITH SITGREAVES 


Anderson proposes rather the statistic 
(4) Ww i Xie Ge sid ee ng (x +f saa ane a1. 


If we let A = (N,; + Ne — 2)S and [(N,N2)'/(N; + N2)|(X° — X®) = Z, 
we can write U = [(N; + N2)'(Ni + Ne — 2)/(NiN2)*)Xy,4,H4~Z. Under 
either alternative the vector variable Z has an expected value [(N,N2)!/ 
(Ni + N2)'}(u — uw), and covariance matrix >. If = , the expected 
value of Xy,+",41i8 u”; if r = m2, the expected value of Xw,+45,41 is nu”. Thus, 
in either instance, the sampling distribution of U is contained as a special case 
of the sampling distribution of 


(5) V = kY,A™Y, 9 


where k is a known scalar, Y; and Y:2 are p-dimensional normal variables with 
expected values ¢ and &, say, respectively, and A is a p-p symmetric matrix 
with a Wishart distribution involving n degrees of freedom; the 3 sets of vari- 
ables are independently distributed with the same covariance matrix 2. Further, 
the statistic W can be written 


W = (Xwisngui — (1/(N1 + N2))(NiX? + NX®)YS7'(R? — X®) 
+ ((1/(N; + N2))(NiX°? + NX) — (RX — FS (RO — RF) 
= (Xwi4nyo1 — (1/(N1 + N2))(NiX® + NX))'S(R™ — X™) 
+ [(Ni — No)/(2N, + 2N2)(X° — X°y'S UR? — FX). 
Or, if we let 
[((Ni + Ne)’/(Ny + Ne + 1)’ 
-(Xw,4nga1 — (1/(N1 + N2))(NiX + NX)) = Z*, 


we can write 


W = ((Ni + No + 1)(N1 + No — 2)/(NiN2)'JZ" AZ 
+ [(Ny — N2)(Ni + N2 — 2)/(2NiN2)]Z’A'Z. 


The vector variable Z* is normally distributed, independently of Z, with co- 
variance matrix >. Under the hypothesis = 7m, , the expected value of Z* is 
[No/(N, + N:)(N, + No+ 1)}(u" — yw’): under the hypothesis = 7m, the 
expected value of Z* is [—N,/(N,; + N2)(N; + No + yu — gy”). The 
sampling distribution of W under either alternative is thus a special case of the 
sampling distribution of 


(6) W* = aYiA'Y: + bY:A'Y2, 


where a and b are known scalars, and Y,, Y2, and A are defined as before. In 
C r . 1 (2) 
the case of W, the vectors ¢ and é are proportional to (uv — yu). 
Wald [1] investigated the general sampling distribution of V, and showed that 


the statistic can be expressed as a function of 3 variables. These variables, which 





DISTRIBUTION OF TWO MATRICES 265 


he called m, , mz , and m; , and which become mj, mz, and my, in our notation, 
are the elements of the symmetric matrix 

~ , —lur 

(7) a.w-Ts 7, 


where Y = (Y,, Y2) and B = A + YY’. The classification statistic V can be 
written 


(8) Ye hate 
at (1 — my)(1 _ Me) > min" 


Wald showed geometrically that the distribution of M is a constant multiple of 
the product of 3 factors, the first a product of gamma- and beta-functions, the 
second an exponential term, and the third the expected value of a matrix of 
noncentral Wishart variables which was not evaluated. Anderson [2] has eval- 
uated this product in the case when ¢ and é are proportional. 

In this paper, we give an analytic derivation of the distribution of M in the 
case when ¢ and é are proportional, obtaining the constant of the distribution 
(which Wald and Anderson did not obtain). From the distribution of M, we ob- 
tain the distribution of the related matrix 


(9) M* = Y’A"Y. 
It can be easily shown that 
(10) M = M*(I + M*)". 


These distributions are useful because of interest in the exact sampling dis- 
tributions of U and W. Further, as will be shown in a subsequent paper, an 
approach to the classification problem based on the principle of invariance re- 
sults in a complete class of classification regions depending only on the elements 
of the matrix M*, or equivalently of the matrix MV, and on a single function of 
the parameters. 


2. Distribution of M. We can write p = k,é and § = kd, where k, and k, are 
known scalars. The joint density function of Y and A is given by 


| l> |—4(n-+2) A | }(n—p—]) 

Y¥,A) = —— 

P oir n+2) p+p(p-—1)/4 A 1 dere 

(11) 2 7 IL ra@™ + 1 - 3) 
t=1 


- exp {— $\°(ki + hk) —$ tr D(A + YY’) + 8'D"(ki¥i + beY2)}, 
where \” = 6’=""5. We make the transformation B = A + YY’. This is a one- 
to-one transformation with Jacobian 1. We have 


(Y.B) Pe |-4(n+2) | B— py’ |i(n—p—1) 
Aa 


P 
(12) girint2) ght? o-) ‘Wram + i oe i)) 





~ exp { —}\°(ki + KR) —3 tr SUB + 8S Y: + be Y2)}. 





266 ROSEDITH SITGREAVES 


There is a nonsingular matrix V such that VEW’ = J and é’'W’ = (A, 0, --- , 0) 
with \ 2 0. We make the transformation Y* = VY and B* = WBW’. The Jaco- 
bian of the transformation is | = |*“’*”. Under the transformation 


|B—YY’| = |v" Be" — v'y*y*’y'" | 
= |v"(B* — Y*y*)w’"| = | wy || B* — y*y*’| 
= |2|-|B*— Y*y*’|, 
and 
VE (kaY, + keY2) = Mkt, + key?). 
Further, 
M = Y'B'Y = Y*v" "(wv Bt’) w"'y* 
= Yu WB wy 'y* = y*Br'y*, 


The matrix B is positive definite with probability 1, and the matrix WV is non- 
singular, so that the matrix B* is positive definite with probability 1. We can 
write B* = 7'T’, where T is a nonsingular triangular matrix whose elements are 
functions of the b?; , chosen so that 


tn = of bij = 0 for j > 4. 


We use the matrix 7 to make the transformation Y* = TU, where 
U = (Ui, U2) has the same dimensions as Y*. The Jacobian of the transforma- 
tion is | 7 |’ = | B* |. We have 
| Be — y*y*’ | =| TT’ — TUU'T’| =| TU — UU)T’ | 

= |T|’-|1 — UU'| =| B*|-|1-U'U|, 
since | J — UU’| = |I — U’U |. Also 
M = Y*Be"y* = UT(TT’) TU = U'T'T''T'TU = UU. 
The joint density function of U and B* is given by 
B* 4(n—p+)D | I ents U'U }(n—p—1l) 
pU, B*) sd od es Pane a a A ~ ee 
(13) givens) gro O4 TT P(A(n + 1 — 3) 
- exp { —$\°(ki + 2) —} tr B* + NON (run + be un)} 


The variables bf; range over all values such that B* is positive definite. The 
space of U is the set of points independent of the b?;’s for which 


(l1-UU) 20 (1-UU,) 20 |UU\20 


and 


|\r-UU| 20. 





DISTRIBUTION OF TWO MATRICES 


It can be shown (e.g., see [3]) that 


4(n—p+1) —jtrBe a = 
/ Be [Moth eteene abe, --- dbSy 
B*,1) pos.def. 


(14) —aosbs/bet <0 


ae oot re Oe Oh iT rii(n+2—i%)), if =2,---,p, 


where Bt = (b7; — bi,bT,/bt,). Hence 
(U,bh) = 5 _ TA(n + 1) [7 = U'U oo" 
(is) ? r((n — p + 2))PG(n — p + 1))2""*? x” 
- exp {—$bh —4A7°(ki + ke) + ADH (ki un + keu2)}. 
Expanding exp (xb (yun + keu2)) in a power series and integrating with re- 
spect to bf, we obtain 
ri (n + 1)) | Pas U'U | )4(n—p—1) e Pate 
(U) = aks at 
ml ram — p+ 2G = p+ ie 
ie r(d(n +2 ‘na 2" (kv un 4 + Keats)? 
7=0 we} 


We can construct an orthogonal matrix G as follows. Let 


Been. 4 
n= m/ (Est) 


(16) 


P ‘ 4 Bir. 4 Pp 
ju = — (3 uh) /(z uh) 925 = wr tin / (3 u 


For k = 3, --: 


ae 


P 
— wuua /( > 
b=k—1 


Ini = Lose —- 8 Jr.21 = —tp:/(uyaa + u*,)'; 
Jpop = Up-i/(Up-1,1 + ne 
We make the transformation V = GU,. Under the transformation 


lr-U'U | (1 — UiU,)(1 — U3U2) — U{U2UU; | 
= |(1 — UiU,)(1 — V’'V) — viU4U, | 


Pp 

2 

Me = Vi un /(X Un 
lanl 





268 ROSEDITH SITGREAVES 


Now we make the following transformation. Let 


U1 = mi, cos 4, 
Us, = mi; sin 4; cos 6;, 


Up—-1,1 = mi, sin 6; sin 6 +++ sin 0,2 cos 8,1, 
Us == mi, sin 6; sin 6 +--+ sin 6,» sin 0,1. 


The Jacobian of the transformation is 


—*?) . —? . —3 . 
tmi,? sin 62” sin 02 --- sin 05-2 


with 0 S$ 6; S fori = 1,2,---,p — 2and0 S 6, S 2z. 
Under the transformation, U;U; = my and 
p(mu, if 6) 
s(/1 —4$2(k24+k2)_ bi p—2) \ a 2 2 ee 
T(d(n + i) mi (1 — mn)(l — D0 02) —muri 


val 
2rT(a(n —pt+2))ran—pt)) — 


-sin Of ~ -++ sin Op-2 


‘ > r(A(n + 2 + j))2"v4(ky mi, cos 0; + kev, cos 0; — kev: sin"6,)? 
=O jy! : 


Since 


np aq . 1PM + 1)) PR(n + 1) 
I sin’ 6 cos’ 6d6 = _— TGim +n) +1) : 


it follows that 


r ix + - 
aaa as sooo r3)r(4(p — 7)) 
sin” * 6; d0; = 2 | da" 6, a = e i 
| 0 P(p — 2 + 1)) 


$= 4° 


and 
at Tp —2)) oe 
r(4))?* : at eninaennes 
(TG II ra(p—27+1)) TGP — Dd) 


2 
Further, | dé, = 2x so that we have 
0 


pP(mu ’ if 6;) 


» 2) — P 9 
T(3(m + 1))e AE mb (a — mu)(l — Ded - m0?) 


t=1 


TA(n — p + 2))PA(n — p + D)PG(p — 1) Ae 
HHT + 2454+) 
‘mo foo jiu 


-(2*y)*"(ky mi, + kev,)*(—kev2)' sin 0?-**' cos 6}. 





DISTRIBUTION OF TWO MATRICES 269 


sf 
Since [ sin” @ cos" @ dé = 0 for n odd, we obtain on integrating with respect 
0 
to 6, 
p(mn, V) 


Pp §( n— p—1) 
r(d(n + De atPnte?( 1 — mu)(1 — >> 0) - mutt) 


(19) r(4(n — p + 2))P Gm — p + 1) Ep — 1)) Aen” 


HLTA + 2+ Y + MUM p = 1 + DG + 92"! 
j=0 [m0 PG(p + I) + 9)(27) 1 U! 


: \ 
* (ky mi; + ke v:)7( —ke v2) s 
} 


We partition the vector V into two parts, the first part consisting of the 
single element v; , and the second part of the (p — 1)-dimensional vector V*. In 
a manner similar to that in which U, was transformed, we transform the vector 
V* to a variable mz. = V* V*, and to (p — 2) angles. After integrating with 
respect to the angles, and simplifying the resulting expression, we obtain 


An + We Mm mer 
P(3(n — p + 2))P(4(n — p + 1))T(A(p — 1))T (4) 
> (1 — my) — Me2.1) = or re 


& NaS (4n?(ki mu + Qkikembyo, + K(o? + mea))?. 


p(mu, ™Me2.1, 1) = 


Finally, we make the transformation 


i 2 
v1) = M2/mii Mo.1 = Mz — Mi2/mMy 
and we have 


T(a(n — p + 2))PG(n — p + D)TA(p — D)TG) 


= T(4(n + 2) + 7) 2\ 577.2 2 j 
oe (4 Rien + kik my + k3mn)4, 
— rGp i! (4 (kimy 1h2™M2 2 M22) 
withO Sm 31,0 S meS1,'|M\20,|2-—-M\20. 


p(M) = r(}(n + 1))e7 tite? | M | i(p—3) | I a M | 4(n—p—1) 





(21) 


3. Distribution of M*. Making the transformation defined by M = 
M*(I + M*)™, we obtain 
T(A(n + 1))e PAT? | ee | HP-d 
(y(n — p + 2))T(G(n — p + I))PG(p — 1))T (4) 
= /T(s(n + 2) + 9) 25 
ZBirdps+ag 
(kim}, + 2kike mie + k3 m3 + (ki + k3)(m}, mi = mi: ))’\, 
iT + M* | int24i : 


p(M*) = 


Ee rE SPC I 





270 ROSEDITH SITGREAVES 


Acknowledgment. Grateful acknowledgment is made to Prof. T. W. Anderson, 
under whose direction this paper was prepared. 


REFERENCES 


[1] A. Waxp, ‘‘On a statistical probiem arising in the classification of an individual into one 
of two groups,’’ Annals of Math. Stat., Vol. 15 (1944), pp. 145-162. 

(2) T. W. Anperson, “Classification by multivariate analysis,’’ Psychometrika, Vol. 16 
(1951), pp. 31-50. 

[3] T. W. ANDERSON AND M. A. Grrsuick, ‘‘Some extensions of the Wishart distribution,” 
Annals of Math. Stat., Vol. 15 (1944), pp. 345-357. 





THE DISTRIBUTION OF THE NUMBER OF ISOLATES IN 
A SOCIAL GROUP’ 


By Leo Katz 
University of North Carolina and Michigan State College 


1. Summary. The exact chance distribution of the number of isolates in a 
social group is found in this paper, using methods due to Fréchet. The binomial 
distribution fitted to the first two moments of the exact distribution is shown 
to give reasonably good approximation and a slightly coarser binomial approxi- 
mation is indicated. 


2. Introduction. Consider a group consisting of N individuals. Each designates 
d of the others with whom he would prefer to be associated in some specified 
activity, that is, each chooses d from N — 1 possible associates. In the context 
of the group and the specified activity, an individual is said to be an isolate if 
he is chosen by none of his fellow group members. It is immediately obvious that 
the number of isolates depends upon the size of the group, the number of choices 
permitted and the extent to which the group, as a social organism, provides ac- 
ceptance for joint activities for the individuals who compose the group. Thus, 
when N and d are fixed, the number of isolates becomes an important characteris- 
tic of the group structure. When it is important to state whether the number of 
isolates is unusually large or small, it is necessary that the chance distribution 
of this number be known. 

The history of attacks on the distribution problem is brief. Lazarsfeld, in a 
contribution to a paper by Moreno and Jennings [8], gave the expected (mean) 
number of isolates as 


NU(N — d — 1)/(N — 1)]"", 


but made no attempt to obtain the distribution. Bronfenbrenner [1] gave (with- 
out proof) an incorrect version of the distribution function. He gave the expres- 
sion, which he claimed was “developed deductively and checked by empirical 
methods,” 


(N — a = 2) 


(1) P(i) = Pr {i or fewer isolates} = 1 — —(N—-1® ’ 


where a” = a(a — 1)(a — 2)---(a — b + 1). This form gives completely non- 
sensical results in application. Edwards [2] conjectured that the Bronfenbrenner 
formula gives the probability of a given person’s including in his list of d at least 


1 Work done under the sponsorship of the Office of Naval Research at Chapel Hill, North 
Carolina, and presented at the Chicago meeting of the Institute of Mathematical Statistics, 
December 27, 1950. 


271 


| 
4 





272 LEO KATZ 


one of 7 + 1 specified names. Edwards then gave correctly the probability of 
the maximum possible number of isolates, 


rT (N-1-d) 
(2) P{[N —1-—d] =Pr{N — 1 —disolates} = ( N St) 


N-1-d ee . , 
( d 


where > b S a, is the binomial coefficient a!/[b!(a — b)!]. Note that there 


cannot be N — d isolates, since d persons can be chosen only for a maximum total 
of (N — 1)d times, less than the Nd choices actually made. 

In the last paper cited above, Edwards went on to set up the probability of 
N — 2 — d isolates by eliminating irrelevant cases from those in which the 
isolates name d from a list of d + 2 while the nonisolates choose d from a list 
of d + 1 names, and indicated that the process might be continued to obtain 
the probabilities of N — 3 — d isolates, etc. The form of these results, it is stated, 
would indicate a complicated algebraic expression for the required probability 
distribution and the question is then raised whether the existing technique of 
experimentation should not be modified to meet the practical requirement of 
simple mathematical treatment. 

In this paper, we shall first obtain the exact distribution of the number of 
isolates on the assumption of random choice and second, we shall obtain an 
approximation which does satisfy the requirement of simple mathematical treat- 
ment. An example will be given to indicate the accuracy of the approximation 
for a typical application. 


3. Exact distribution of the number of isolates. It should first be remarked 
that any division of the group into those who are isolates and those who may not 
be produces two distinct patterns of choices. Each isolate selects d from among 
all those in the second group, but each member of the second group must select 


d from among those members of the second group not including himself. Let 
Piy.in.-++,i, = Prfindividuals 7; , 72, --- , % are isolates}. 


As an immediate consequence of the remark made above and the symmetry of 


the situation, 
r _ t) , Y -k- " - 
d d 


(3) ipeignse ste = Jia. a > 
e N - ') * _ ‘) 
d d 


for every (t , 72, °°: , %&). Setting 


; N N\ (N — kV (N —k—-—1\"*(N -1\"% 
(4) 8. = (7) pana = (Z)( 7 y( : ) ( ; ) 


the principle of inclusion and exclusion ([3], ch. 4) gives immediately 


N-1-4 ‘ 
(5) Py) = Pr {exactly & isolates in the group} = >. (—1)** (?) S;- 


k 


yak 





NUMBER OF ISOLATES 273 


Equation (5) gives the exact probability of k isolates, in a group of N where 
each individual makes d choices, as a linear combination of the 8, . 


The values of S, may be computed directly from (4) or recursively, noting 
that So = 1 and 


@) Ss Naoko a(V oboe R oan 
Si k+1 N -k N-k-1 

The form of the last term in (6) suggests interesting asymptotic behavior. We 
are, however, less interested in asymptotic characteristics of the distribution 
than in its properties for moderate values of N. We may take the asymptotic 
behavior to give an indication of what may be a reasonable approximation, but 
the quality of the approximation must be judged by results for typical cases; 
here, N is usually between 10 and 100. We shall later consider one such typical 
example in which N = 26, d = 3. 

If we do not require the values of the individual P;,,; but are only interested 
in the moments of the distribution of isolates, it turns out that the S, quantities 
are of central importance. Fréchet [4] has shown that 


(7) ak = k! Si ’ 


where aq) is the kth factorial moment of the distribution, given by au) = 
> i Py . We shall have occasion to use these factorial moments in the 
following section. 


4. Approximate distribution of number of isolates. Since we know the exact 
distribution, an approximate distribution is useful only if it is more easily com- 
puted. It is easily shown (see Feller [3]) that, for d fixed, the limiting distribution 
is Poisson with Pr(k) = e¢ ‘\*/k!, where’ = N(1 — d/(N — 1))*™. However, 
for moderate values of V, the approximation is not good; an example is given 
later. 

Following the procedure of Kaplansky [7] produces a modified Poisson ap- 
proximation which is quite good. The drawback to this procedure is that com- 
putations are almost as difficult as for the exact distribution. Therefore, we seek 
another approximation to satisfy the dual requirements of accuracy and sim- 
plicity. 


From (4) and (7), the mean and the variance of the number of isolates are 
respectively, 


d N-I1 
8) = N1- ' 
( mean aq) (: N i) 


. 2 
variance a(2) + au) — aq) 


7  . ; 
= N (1- y :) [r+o -1-a(i-y 





274 LEO KATZ 


From (9), wesee var (k) = mean (k) [1 — (d + 1)(1 — d/(N — 2))* > + O(N”)] 
- mean (k) [1 — (d + 1)e™“]. Since the variance is less than the mean, the bi- 
nomial distribution, b(x; n, p), is strongly suggested (choice being restricted to 
simple distributions). We shall not insist that n be an integer; thus, we have es- 
sentially a beta distribution. For this distribution, a, = np" and, fitting the 
first two moments, we have 


d N-1 
(10) np = aw = N (1-44) 9 


l (2) 1 d oy 
ees a eed 6 eee si a ai se ' 
a a eS (1-5)[1- @=aorcical 


Also, since ay 41)/acr) = (n — r)p, we form the functions, 


‘ A(r+1 ac ‘ 
(12) D, = YP - r+ (r — Daw, r= 2,3,4,--- 


Air) a1) 


, 


which vanish identically for the binomial distribution. These functions are equiva- 
lent to the “total criteria” proposed by Guldberg [6] and Frisch [5] for judging 
whether an observed series may be approximated by a binomial frequency func- 
tion. In their work, the approximation is considered to be good when the criterion 
functions of the moments of the observed series are close to zero. We shall extend 
the notion to cover the case of approximation of a more complicated probability 
law by the binomial law. 

Setting r = 2andr = 3 in (12) gives two functions which are exactly equiva- 
lent to the two criteria given by Guldberg (allowing for an omitted term in his 
second result). Also, the complete set (12) is equivalent to Frisch’s total criteria 
for g = 1,h = 1, 2, 3, --- in his notation. Since his criteria for all other values 
of g may be expressed in terms of those for g = 1, (12) is equivalent to the com- 
plete set of conditions given by Frisch. 

Substituting from equation (7) into (12), we have 


D, = (r+ 1) = - or + (r — 1)S,, 
1 


Wr 


or, using (4) and (6), 


= N-—r-—d\~ eee. 


N-2-d\"",,, TS 


For large N, each power of a fraction in (13) of the form ((a — d)/a)* is ap- 
proximately equal to e~’ and D, = 0, approximately. In the limit, every D, = 0. 
the asymptotic form of the distribution in this sense is, therefore, binomial. 
Further, the approximation should remain good even for moderate values of 
N (particularly when r is small) since the errors made by the exponential ap- 
proximation are not only small but tend to compensate for each other. 


(13) 





NUMBER OF ISOLATES 275 


We may, then, use a binomial probability law approximation with p and n 
given by (10) and (11). (If 1/n in (11) is evaluated to terms of O(N~*), we find 
1/n = (d + 1)/(N — 1 — d) orn = N/(d + 1) — 1, approximately. This seems 
consistently to understate the value of n from (11); accordingly, it is suggested 
that n be approximated by 


n= ee seal 
d+ 1 i 
In the next section, we shall compare this approximation with the exact dis- 


tribution for a typical pair of values.of N and d. We also give, for comparison, 
the Poisson approximation. 


(lla) 


TABLE 1 
Comparison of the exact and approximate distributions of the number of isolates 
for N = 26,d =3 


Pi\(exact)| p(approx.) pi — Py ; t(— Pr 


0000 .309 794 .311 098 .0013 ‘ 989 .0352 
2429 .402 574 .399 727 -0028 ; 152 .0354 
9281 .214 316, .215 365+ -0010 ‘ 370 .0189 
86507 .061 532) .062 473 -0009 .069 306 .0078 
5606 .010 564 .010 354 -0002 ‘ 440 .0079 
6882 .001 138, .000 943 .0002 -003 9257 .0028 
.000 10596 .000 679, .000 039 .00004 696 -00062 
.000 0043 61 .000 003 .000 0002 .000003 . 106 .000103 
.000 0001 17 .000 014 
.000 0000 02 .000 002 


0 
1 
2 
3 
4 
5 
6 
7 
8 


“~ 
_ 


5. An example. Moreno and Jennings [8] considered in some detail the case 
N = 26,d = 3. Since, also, a number of later writers have treated the same case 
as a reasonably typical one, we will test the accuracy of the approximation in 
this situation. The computation of the exact probability distribution seems to be 
best performed in two stages. In the first, the logarithms of the ratios S;4:/S,; 
of equation (6) are obtained using 7-place tables, and the S; themselves obtained 
from the partial sums of the logarithms. These values appear in the second 
column of Table 1. In the second stage of the computation, the exact probabili- 
ties are found by setting the S; into (5). The exact probabilities are given to six 
decimals in the third column of the table. 

In the computation of the approximate probabilities, we take advantage of 
the already computed values of S,; and S, and equation (7) to obtain directly 
the factorial moments of (8) and (9). From (10) and (11), we have p = .1717247 
and n = 6.197378. We then compute the binomial probabilities, p; = b(7; n, p), 





epee te 


| 
| 
| 


non NNN NDR 


276 LEO KATZ 
¢m=0, 1,2 .-- 


- , ({n] + 1), where [n] is the largest integer in n, in this case, 6, 
using po = 


(1 — p)" and pis:/pi = (n — i)p/(i + 1)(1 — p) as suggested by 
Guldberg [6] and others. The approximate probabilities, p; , appear in the fourth 
column of the table to six decimals. It will be seen that the fit to three decimals 
is almost exact and certainly good enough for tests of significance. The dis- 
crepancies, pj — P,, , are given in the fifth column. The Poisson probabilities 
and errors appear in the sixth and seventh columns. 

The discrepancies for the ‘binomial’? approximation are not particularly 
systematic except in the upper tail of the distribution, where the binomial gives 
zero probability for all numbers of isolates above seven. Although numbers 
through 22 are possible, they are so unlikely to occur by chance that this pos- 
sibility may be practically disregarded. For example, the exact probability of 
eight isolates by chance is about one in ten million. The Poisson distr. bution 
appears to be ‘flatter’ than the exact, understating probabilities for the central 
values and overstating for both tails. 

As a further check on the accuracy of the approximation, the values of y, = 
bs/m2 and y2 = us/us Were computed for the exact distribution and for the “‘bi- 
nomial” approximation. These computations give y, = .7193 for the exact, 
.6993 for the approximate distribution; y, = 3.2620 and 3.1663, respectively. 


REFERENCES 

[1] U. BRONFENBRENNER, ‘“The measurement of sociometric status, structure and develop- 
ment,’’ Soctometry, Vol. 6 (1943), pp. 363-397. Reprinted as Sociometry Monograph 
No. 6, Beacon House, New York, 1945. 

(2) D. S. Epwarps, ‘‘The constant frame of reference problem in sociometry,” Sociometry, 
Vol. 11 (1948), pp. 372-379. 

{3] W. Feuer, An Introduction to Probability Theory and Its Applications, John Wiley and 
Sons, 1950. 

[4] M. Frécuer, Les Probabilités Associées 4 un systeme d’Evénements Compatibles et Dé- 
pendants, Actualités Scientifiques et Industrielles, Nos. 859 and 942, Hermann et 
Cie, Paris, 1940 and 1943. 

[5] R. Friscn, ‘‘On the use of difference equations in the study of frequency distributions,” 
Metron, Vol. 10 (1932), pp. 35-59. 

6] A. GuLpBere, ‘“‘On discontinuous frequency functions and statistical series,’’ Skandi- 
navisk Aktuarietidskrift, Vol. 14 (1931), pp. 161-187. 

{7| I. Kapuansxy, ‘The asymptotic distribution of runs of consecutive elements,”’ Annals 

of Math. Stat., Vol. 16 (1945), pp. 200-203. 
J. L. Moreno anv H. H. Jenninas, ‘Statistics of social configurations,’’ Sociometry, 


Vol. 1 (1938), pp. 342-374. Reprinted as Sociometry Monograph No. 3, Beacon 
House, New York, 1945. 


{8} 





NOTES 


JUSTIFICATION AND EXTENSION OF DOOB’S HEURISTIC APPROACH 
TO THE KOLMOGOROV-SMIRNOV THEOREMS' 


By Monroe D. DoNsKER 


University of Minnesota 


1. Introduction and summary. Doob [1] has given heuristically an appealing 
methodology for deriving asymptotic theorems on the difference between the 
empirical distribution function calculated from a sample and the actual dis- 
tribution function of the population being sampled. In particular he has applied 
these methods to deriving the well known theorems of Kolmogorov [2] and 
Smirnov [3]. In this paper we give a justification of Doob’s approach to these 
theorems and show that the method can be extended to a wide class of such 
asymptotic theorems. 


2. The justification for Kolmogorov’s theorem. Let x; , x2, ---, be mutually 
independent, identically distributed random variables with distribution function 
F(A), and let »,(A) be the number of z,’s among 2, 22, --- , 2, Which are S . 
In studying the difference between the empirical distribution function, v,()/n, 
and F(A), Kolmogorov showed that if F(A) is continuous, the distribution of 


(2.1) Lek, (a) ie FO)) 


—wcrAc4o n 


is independent of F(A). For convenience, therefore, we will assume that the 
variables are uniformly distributed on (0, 1), that is, F(A) = AX forO S A <1. 


Let” 
(3 (A) ‘6 r) ; 
n 


(2.3) lim P{n'Dt < a} =1—e€ 


no 


2.2) Dt = Lub. 
sAs1 


< 
= 


One of Kolmogorov’s theorems states 


’ 


and for our purposes it will be sufficient to justify Doob’s method for this par- 
ticular theorem since the justification of the method in general follows from it. 
Following Doob, define 


(2.4) z(t) = né (A ~ t), O<st<l. 


' Research begun while the writer was a member of an ONR sponsored project in prob- 
ability at Cornell University. 
? For ease of comparison, we are using Doob’s notation wherever possible. 


277 


aid 





278 MONROE D. DONSKER 


Clearly, 


E{z,(t)} = 0, 0s 1, 
(2.5) 


E{{[xa(t) — xa(s)]’} = (t — s)[l — (¢ — 8)}, OSsstSl. 


Let {x(t)} be a one parameter family of random variables, 0 < ¢ S 1, with the 
properties: 

(a) for each j, if0 S 4 < --- <¢t; S 1, the j-variate distribution of the vari- 
ables x(t,), x(t), --- , x(t;) is Gaussian; 


(b) E{zx(t)} = 0, l, 
(2.6) Ef{{x(t) — x(s)P} = ¢ — s)[l — @ — 8)], < 1; 
(c) P{x(0) = 0} = 1. 


The x(t) process can be selected so that with probability one it has continuous 
sample functions. Let Y be the space of these sample functions. The x(t) process 
selected here is such that for any 7, if0 S 4 < --- <¢t, S landif (a, ,a2,--- 
a;) is an arbitrary vector, we have from the central limit theorem 


’ 


(2.7) lim P{z,(t:) S a1;% = 1,2,---, 7} = Plx(ti) S ai; t = 1,2, ---, j}. 


nwo 
Doob’s heuristic argument consisted in assuming that in calculating asymp- 
totic z,(t) process distributions when n — ©, one could replace the z,(t) proc- 
ess by the x(t) process. In particular, with reference to (2.3), his assumption 
was that 
(2.8) lim P{n'Dt < a} = 


ns 
where D™ = maxos:<: 2(t). What we wish to show, therefore, is that 


(2.9) lim Péiub. [nt (mo = ‘)] < J = P} max z(t) Ss a}. 


n+ \O<t<1 n O<t<1 


Let Z, be the event that for all ¢ in (0, 1), v(t) S an' + nt, and let E be the 
event that for all ¢ in (0, 1), z(t) S a. We can write (2.9) as 


(2.10) lim P{E,} = P{E}. 


n—-2 


Let E’, be the event that for all i = 1,2, ---+ , 7, valt/n) & an* + 7, and let 
E’, be the event that for all i = 1, 2, --- ,n, »,(i/n) S an! +i — 1. We have, 
clearly, E”, C E, C E’, . In what follows we will show that 


(2.11) lim P{E’,} = P{E}, 


n-—-2o 


ww 


and an exactly similar argument shows lim,.. P{E,,} = P{E}. Hence, we will 
have shown (2.10). 





DOOB’S HEURISTIC APPROACH 279 


To show (2.11), let N be a Poisson distributed random variable with mean 
n and independent of the random variables x; , t2 , 13, --: . We have, clearly, 


(2.12) P{E’,} = Pivy (‘) < an' + i;i = 1,2,--- ,n|N =n}. 


Let y: = vw(1/n), yi = vw(t/n) — vy((i — 1)/n), i = 2,3, --- , n. The variables 
Yi, Y2,°** » Yn are independent (cf. Kac [4]), are Poisson distributed with mean 
1, and if we let 2; = y; — 1,7 = 1,2,---,n, 8m = 2 + 22+ ++: + 2m, then 
8m is a sum of independent variables and we can rewrite (2.12) as 


(2.13) P{E’,)} = P{s; San’; i = 1,2, «++, n| 8 = O}. 


Now, 


(2.14) 1 — P{E..) = > P{s; < an';i = 1,2, ---,r — 1,8, > an'|s, = 0}. 
r= 
Let k be a fixed positive integer; define n; = [jn/k],j7 = 0,1, 2,---,k, and 
let an «€ > O be given. From (2.14) we obtain 
k—1 
1-P{EA}=> DY Pls; S an'si =1,2,---,r —1, 


pO nj@<rs M541 


8, > an',| &,,, — 8 | < en'| 8, = 0} 
(2.15) 


k—1 


+> > Pils Ss an'si = 1,2,---,r—1,8, > an’, 


pr 0 WI<TE M541 


| Saj41 — 8 | 2 en*|s, = 0}. 


Let Parla) = P{s,,; < an’; j = 1,2, ---,k| 8, = 0}. Clearly, 
(2.16) P{E,} Ss Pax(a), 


and also the first sum on the right of (2.15) is less than 1 — P,x(a — €). The 


second sum on the right of (2.15) can be written as (cf. Chung [5], pp. 39-41) 
n! e” — A : 
— P{s; S an’;i = 1,2,--- ,r—1,8 > an’, 


n 
n we 0 nj<rs 541 


| Saj41 _ 8, | 2 en’, &, = 0} 


1p” k—2 
— we [= > P{s; < on’; i = 1,2,---,r— 1,8, > an} 


n” pO Nj<rs nj41 
7) 
» Dd P{ | 85,4; — 8] = en’, on,,, = y} Plan — 8n,,, = —y) 
Vv 


-+ so P{s; < an’; i = 1,2,---,r— 1,8, > an’, 


Me-1< TSM 


|s. -— &|2 en’, s.'="0) . 


ED 09 ON 1 oA ee ee eee ee 


Bea 





| 
| 
' 


— 


hs AR IS SET 


a 


280 MONROE D. DONSKER 


To estimate the first term in the brackets we note that since the z,’s are dis- 
tributed as follows: 


—! 
e 
lec > Bee, m = 0,1,; 


we have, noting the maximum term of the Poisson distribution, 
(2.18) P{8, — 8n,,, = —y} S Aik'n™, 


where A; is an absolute constant. Also, from Tchebycheff’s inequality 


(2.19) D> Pf | 8,,, 


—s,|2 =y} = P{ | %;,, — 8 | Zen 
v 


we get 
, t < x e 
i= ke 


The first term in the brackets on the right of (2.17) 
than A,k in te. 


The second term in the brackets on the right of (2.17) is less than 


> 2d, 18; S an’; ee Pe a a 


Mmee-1<TSnyoan 


is therefore less 


s = y}P{s, —- = —y}? 


and using similar estimates is shown to be less than Ak tn, where Az is an 
absolute constant. Thus, we have from (2.15) 


290) _ py E’ le an o ) n le” Z A; 
(2.20) 1 I Lay = l n,k (a € _ ose 
n” kinte 


This together with (2.16) gives us 


(2.21) Pind (a — €) = - oe 53 E’ | = Pos (a). 


From (2.7) we have 


(2.22) lim Pax (a) = P {2 (*.) 


If in (2.21) we hold k and e fixed and let n — ©, we get from (2.22) and Stirling’s 
formula that 


/ 


{ . ) /z_ 4 
p{x(*) Sa-—e; 7 em jE = lim P\E’,} 
| k ki J no 
(2.23) ; 


-- ‘ ( fi ; y 
= lim P{ Y} <P rat) Sa; t1=1,2,--- kl. 
In (2.23), if we hold ¢ fixed and let k — © we get from the continuity of the 
x(t) process that 


{2(t) Sa —e, te (0,1)} S lim P{E,} < lim P{E,} 


Ss P{x(t) S a, te (0, 1)}. 





DOOB’S HEURISTIC APPROACH 281 


Now finally, using the fact that the distribution function of maxog:g: z(t) is 
continuous, and letting « — 0 we obtain the desired statement (2.11). 


3. Extension. Having shown that 


(3.1) lim P{lu.b. z,(t) S a} = P{ max z(t) S a}, 

n—+% Ostsl1 Ost<1 
it is possible, using methods identical to those used by the writer in a recent 
paper (Donsker [6]), to obtain a general theorem like (3.1), but where the func- 
tional maxXesig: x(t) is replaced by an arbitrary functional F[r(‘)] subject to 
certain restrictions. Indeed, we can obtain the following theorem. 

TueoreM. Let R be the space of real, single-valued functions g(t) which are con- 
tinuous on 0 S t S 1 except for at most a finite number of finite jumps. Let F(g\ be 
a functional defined on R and continuous in the uniform topology at almost all 
points of Y°. Then, 

(3.2) lim P{F[z,(t)] S a} = P{F[z(0] < a} 
ee 
at all points of continuity of the distribution function on the right. 

This theorem is proved (precisely as is the main theorem in [6]) by first ob- 
taining (3.2) for functionals of the form f(u , ue, +--+ , tex), Where u; = sup g(t) 
for (¢ — 1)/k < t S i/k and %4; = inf g(t) for (¢ — 1)/k < t S t/k, i = 1, 
2,---,k, where f(u , uw, --- , ux) as a function of its 2k variables is bounded 
on the whole space, Borel measurable and Riemann integrable on every finite 
2k-dimensional interval. Such a theorem is obtainable from (3.1), and more- 
over these functionals can be used to approximate functionals F[g] which are 
bounded on # and continuous in the uniform topology at almost all points of Y. 
The approximation is such that (3.2) can be obtained for this latter class of 
functionals. Finally, the assumption that F(g) be bounded on R may be removed, 
and hence we can obtain the theorem stated above, by considering the func- 
tional e“”“” and using the continuity theorem for characteristic functions. 


REFERENCES 

{1} J. L. Doon, ‘Heuristic approach to the Kolmogorov-Smirnov theorems,’’ Annals of 
Math. Stat., Vol. 20 (1949), pp. 393-403. 

[2} A. Kotmocorov, “Sulla determinazione empirica di une legge di distribuzione,’’ Giorn 
Ist. Ital. Attuari, Vol. 4 (1933), pp. 83-91. 

[3] N. Smirnov, “Sur les écarts de la courbe de distribution empirique,’’ Rec. Math. (Mat 
Sbornik) (NS), Vol. 6 (1939), pp. 3-26. 

[4] M. Kac, “On deviations between theoretical and empirical distribution functions,”’ 
Proc. Nat. Acad. Sci., Vol. 35 (1949), pp. 252-257. 

[5] K. L. Cuuna, ‘‘An estimate concerning the Kolmogorov limit distribution,’’ Trans. Am. 
Math. Soc., Vol. 67 (1949), pp. 36-50. 

{6] M. D. Donsxker, “‘An invariance principle for certain probability limit theorems,” 
Memoirs Am. Math. Soc., No. 6, 1951, 12 pp 


* The space Y is defined above just after (2.6). 


Cee e nee nee 


rE ee ee oe 





282 EDWIN G. OLDS 


A NOTE ON THE CONVOLUTION OF UNIFORM DISTRIBUTIONS' 


By Epwin G. Ops 
Carnegie Institute of Technology 


1. Introduction. Some time ago, when Dr. Acton and the present author were 
preparing a paper [1] on the combination of tolerances, the question arose as to 
the distribution of the sum of rectangular random variables having unequal 
bases. (For equal bases, the distribution has been known since Laplace.) As 
Acton pointed out, the distribution can be obtained by operational calculus. 
However, it seems useful to outline a derivation requiring only the well known 
formula for the probability density function for the sum of two random variables. 
In addition, this note gives several other results which may be needed in statis- 
tical quality control. 


2. The distribution of the sum. Let z; be independent random variables with 
probability density functions 


fxs) = [e(x) — efx; — a)|/a, (a; > 0;4 = 1,2, ---, 0), 


where ¢(x — c) is unity for x 2 c and zero elsewhere. Let s = di x; and let 


f.(s) and F,(s) represent the probability density function and cumulative dis- 
tribution function of s respectively. Then it will be proved that 


(1) fn(s) -_ ao ad > (s — a)" e(s — a,) 


+ > (s— a; — aj)" *e(s — a; — aj) — «= 


t<7 


+ (—1)"(s — Ya)" e(s - Ea) |/|o = 1 Te.], 


F,(s) = | sets — Om (s — a,)"e(s — a;) 
1 


+ ¥ (s — a; — a)" e(s — a; — a) — «> 


<7) 


+ (-1)"(s — Lia)" e(s — Ea) | /| mt Ta. 


The proof is by induction. Using the convolution formula, ([2] p. 191), we 
have, in our notation, 


1 Presented at the Annual Meeting of the Institute of Mathematical Statistics at Chicago, 
December 29, 1950. 

Some revision of this paper was made under partial support from a U. S. Air Force con- 
tract with the Department of Mathematics of Carnegie Institute of Technology. 





CONVOLUTION OF UNIFORM DISTRIBUTIONS 


+00 
_ Als — Of(e) at 


St 
Az 


+2 +2 


) 
e(t — aje(s — t) dt + e(t — asje(s — t — ay) a> 


+00 +% 
[ e(t)e(s — t) dt — / dide-t-a)& 


+2 —* 


The first integral within the braces is zero for t < 0 and fort > s > 0 and is 
unity between zero and s 2 0 so the effective limits are zero and s. Likewise the 
effective limits for the second integral are zero and s — a, . After replacing ¢t — az 
by ¢ in the last two integrals and making obvious changes of limits, we get 


s sa; 
f(s) 5. { f e(t)e(s — t) dt — [ e(t)e(s — t — a,) dt 
a\Q2 \ 70 0 


(4) \ 
s—a2 s—a1~—a2 
[ e(t)e(s — t — a») dt — [ e(t)e(s — t — a, — aa) dt. 
0 0 / 
Finally, 
(5) fe(s) = [se(s) — (s — a)e(s — a) — (8 — ae)e(s — a2) + 


(s — a — a)e(s — a, — a2)}/(a,a2) 


To complete the induction we need only assume that (1) holds for n = k and 
show that it then is true for n = k + 1. Using the same method for combining 
the density functions of s = > x; and 2,4; as was used above for x; and 22, 
this presents no difficulty. Also it is easy to show that (2) is a direct consequence 
of (1). 


3. Asymptotic normality. For use in the remaining sections it is noted that the 
constants of the distribution of s are: 


; . 2 2/46 
(6) mean, uw = Ly ai; variance, ¢, = >, a3 /12; skewness, y; = 0; 
4 2\2 
excess, y2 = —$)0 ai/(D, a’)’. 


The matter of convergence of the classically normed sum to the Gaussian 
distribution with zero mean and unit variance can be settled easily by using 
the well known Lindeberg condition, the sufficiency of which, as Loéve [3] notes, 
was established by Lindeberg and by P. Lévy, and the necessity by Feller. (For 
discussion and references see Loéve’s paper.) 

As the second part of the solution of the classical central limit problem Loéve 
((3], p. 326) states the theorem: 


> 

NC holds and max “2 - 
ksn a(8n) 
2 1 


kanl o°(8n) 


> 0 if, and only if, for every « > 0 


/ x dF, (x + Ex,) — 0. 
z|>eo(s,) 


a A 8 5 tS Crate OE Et 9 ni wa 





ad 


sree nM ABORY 


. 


284 EDWIN G. OLDS 


For our case 


(7) (xu) = >= o(8,) = o, = e Ex. 
™ 4 


In order to specialize the above theorem to our use, we first establish the 
following: 


Lemma. The Lindeberg condition holds if, fork S n, 


max a, 


(8) oe 


Vira 


To prove this lemma it is sufficient to note, first, that each term of the Lindeberg 
sum is identically zero whenever eo(s,) is greater than 3a, , and second, that the 


condition imposed in the lemma implies the existence of an N for any « > 0 
such that for alln > N 


ak 


(9) 


< 
> a V3: 


The following theorem’ can now be established. 


THEOREM. A necessary and sufficient condition for the asymptotic normality of a 
sum of independent rectangular random variables is 


‘ max a, 
(10) im — = = 0. 


n 
ne / 2 
(k< n) V yas 
1 


To prove the sufficiency of this condition we note that by virtue of the above 
lemma, the Lindeberg condition is satisfied and then note, from the quoted 
theorem, that the Lindeberg condition implies normal convergence. For necessity, 
we note that if the condition fails then the Lindeberg condition must fail for, 
otherwise, the quoted theorem would lead to a contradiction. 

Of course the condition on max a, implies that (a;/o,) + 0 and that ¢, > ~. 
Thus any sum for which a; = ra;4; will converge to normality only if r = 1, 
since, forr > 1, (a,/o,) + 0, and for r < 1, o, is bounded. 

4. Percentage outside three-sigma limits. For statistical quality control there 
is considerable interest in knowing the percentage of a distribution outside of 
the limits 1 + 3c. For any particular sum of rectangulars this percentage can 
be calculated from equation [2] above. Often the total range of nonzero probabil- 
ity for the sum will not exceed 60 so that the required probability will be zero. 


? The author is grateful to the referee for suggesting a slightly different form of this 


theorem as an improvement of the author’s original treatment, which used Lyapunov’s 
Theorem. 





CONVOLUTION OF UNIFORM DISTRIBUTIONS 285 


It is easy to verify that this condition will hold whenever a;,; = ra; and either 
Os rs O05 are 2. 

When the range for the sum is greater than 6c, an approximation to the re- 
quired percentage can be obtained from an Edgeworth series. (For discussion, 
see [2], pp. 221-231.) Let x be the standardized variable (s — u)/o, , ®(x) the 
normal distribution function, ¢(x) the normal density function, and ¢‘" (x) its 
ith derivative; then, following Cramér, we have approximately 


(11) f(x) = oz) — M49 (a) + Mg (2) OH 


3! 4! 6! 


Then, integrating and substituting the pertinent values from (6) above, we have 


: ® (2) a‘ 
(12) F(z) = (2) — $2) [ a), 
20 L(>0a%)’ 
Since the lower three-sigma limit for s corresponds to x = —3 we have, finally, 
for the approximate percentage below this limit 


(13) F(—3) = 0.00135 — 0.004 [>oa‘/(>2a%)"), 


where the bracketed quantity takes its minimum value, n™', when all of the 
a’s are equal. 

The multiplier of the bracket has been rounded off for easy use. A better value 
for it is 0.0039885. Using this instead of 0.004 in (13), and making a comparison 
of (2) and (13) when a; = 1 (i = 1, 2,--- , 8) and ag = 2, the result from the 
former formula is 0.000694 and from the latter 0.000685. Using 0.004, (13) gives 
the approximate value 0.00068. 


¢” (2). 


5. An application. The natural tolerance limits for a controlled process often 
are taken as u + 3c. Let us suppose that the individual components are sym- 
metrically distributed originally and then are symmetrically truncated by in- 
spection with bases, a;. Birnbaum [4] has proved that the distribution of the 
sum of the truncated variables is “more peaked’’ about the mean than the dis- 
tribution of the sum of rectangular variables with the same bases, where ‘‘more 
peaked’”’ means less probability for values more than any arbitrary distance from 
the mean. Thus, as Birnbaum points out for the case of equal truncations, the 
distribution of the sum of rectangulars can be used to get an upper bound for 
useful probabilities required for the sum of the truncated variables. For sym- 
metric but unequal truncations, an upper bound to the percentage outside natural 
tolerance limits can be calculated by using formula (2) above. 

REFERENCES 

{1] F.S. Acron anp E. G. Oxps, ‘‘Tolerances—additive or Pythagorean?’’ Industrial Quality 
Control, Vol. 5 (1948), pp. 6-12. 

{2} H. Cramétr, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[3] M. Love, ‘‘Fundamental limit theorems of probability theory,’’ Annals of Math. Stat., 
Vol. 21 (1950), pp. 321-338. 

[4] Z. W. Brrnspavum, “On random variables with comparable peakedness,’’ Annals of Math. 
Stat., Vol. 19 (1948), pp. 76-80. 


Sees ee ee 


Pe ge Sd 





! 
: 
. 
) 
| 


286 SIGEITI MORIGUTI 
A LOWER BOUND FOR A PROBABILITY MOMENT OF ANY ABSOLUTELY 
CONTINUOUS DISTRIBUTION WITH FINITE VARIANCE 


By Siceir1 Moricuti 


University of North Carolina 


Summary. The greatest lower bound of the nth probability moment (1.1) 
of a population with variance o? is given by (3.4). 


1. Introduction. The nth probability moment of a population with the prob- 
ability density function f(x) is defined as 


ewe 


(1.1) a, = | [f(x)}" dx. 


These functionals have drawn the attention of some authors (see for instance 
{1] and the references given there) in connection with fitting frequency curves 
by means of frequency moments. Also it is to be noted’ that the cumulative dis- 
tribution function of the range w of a sample of size n from such a population 
can be approximated, for small w, by nQ,w”. 

In general, it is not necessary that n in (1.1) be an integer. It may be any real 
number. However we put the restriction n > 1 in the following. To be specific, 


we take the population mean equal to zero. Moreover, we consider only popula- 
tions whose variance 


«o 
(1.2) ¢ = / x f(x) dx 
is finite. 
As the probability density function, f(z) must satisfy the conditions 
(1.3) f(z) dx = 1, 
(1.4) f(x) = 0. 


Under these conditions, we try to find a lower bound for Q, . 
Incidentally, 2, has no finite upper bound, because it increases indefinitely as, 
for instance, the probability concentrates more and more to a certain point. 


2. Derivation of the extremal distribution. The calculus of variations suggests 
equating to zero the first variation 


(2.1) 5 if [f(a)]" dx — X | a f(a)dz — p / f(x) az | : 


! The author in fact took up this problem at first in connection with his work on the dis- 


tribution of sample ranges. It was Professor Harold Hotelling who called the author’s at- 
tention to probability moments in this relation. 





LOWER BOUND FOR PROBABILITY MOMENT 287 
where \ and uy are the Lagrange multipliers. Thus we get as the characteristic 
equation 
ed n—1 2 
(2.2) n{f(x)] — rx —p=0 
whence 


2 1/(n-1 
(2.3) f(x) = \™ +e) ' 


n 


, 


—) 
We should take \ negative, and consequently u positive. Then the solution (2.3) 


is applicable in the interval (—~/—y/A, W—p/d). Outside of the interval, 
f(x) should be taken to be identically equal to zero. 


TABLE 1 


Reduced probability moment, 2, 0” 


—1 


Normal dis- Rectangular Asymptotic 
Order Lower bound tribution distribution formula 


. 26833 . 28209 . 28868 . 2547 
.07599 .09189 -08333 .0735 
.02174 .03175 .02406 .0212 
.0°6245 .01133 .0°6944 .0°613 
.0°1797 .0°4125 .0°2005 .07177 
095174 -0°1524 -0°5787 .O°511 
.0°1491 .0°5686 .0°1671 09147 
-0*4299 .0°2139 .0'4823 -0*426 
.0*1240 08094 -0*1392 .0*123 


NIG Ot ke W bo 


;, = 
SOM 


As a change of scale in measuring x does not affect the result essentially, we 
take the nonvanishing interval to be (—1, 1), and write the solution in the form 
f(z) = e(1 — 2”), -ls2xsl 

(2.4) : 
= ioi> s, 


(2.5) 


The variance of the distribution (2.4) is calculated as 
7 n—1 
(2.6) al” aii) ceniaenaiee 
3n — 1 
The nth probability moment of the distribution (2.4) is 


2n -} 
(2.7) = es 


3n — 1 





lr not 


| 
| 
; 


pt A nema 0 ORG ARES: 


f(x) = f(z). 


88 SIGEITI MORIGUTI 


rn “1: n—l . . - . 
rherefore the reduced nth probability moment 2,0" , which is invariant under 


any linear transformation of x, is given for the distribution of the same type as 
(2.4) by 


2n /n—1 / ( n 
(2.8) 9 = — A/ =—— B — ,- 
, . 3n — 1 ly 3n — 1 n-—-1l1: 


That this value gives the lower bound for any population with finite variance is 
to be proved in the next section. 


3. Proof that the solution gives the lower bound. Let us denote the particular 
probability density function (2.4) by f(x), and compare the probability moment 
©, for it with Q, for any distribution with probability density function f(x) and 
the same variance o. From the normalizing condition and the assumed equality 
of the variance, we get 


ao 


(3.1) / [f(x) — fix)| dx = 0, [ a [f(x) — fi(x)] dx = 0. 


v—2 


By virtue of these equations and taking account of 
difference 2,, — 2, in the following way: 


oO 
a. — o, [ ({f(a)}" — | 
-o 


(2.4), we can express the 


‘an? — J \f(x)] dx. 

¥\zi>1 
But Taylor’s expansion up to the second-order term provides the formula, for 
any f and f, 


(3.3) fafPety-n+sa nr 


where f; is a value between f and f. As both f(x) and f(x) are positive, the formula 
(3.3) assures us that the integrand of the first integral in the last member of 
(3.2) is nonnegative. Also the integrand of the second integral is obviously non- 
negative. Hence we get the conclusion 2, = @, , equality being satisfied only if 
In general, it is easily derived from the above and (: 


9 r pee 
(3.4) , = ae | o—t /xf comin 
on — | 4 om — | n— 


Thus the lower bound of a probability moment of any absolutely continuous 


distribution with finite variance o is given by the right-hand member of (3.4). 
It is actually achieved by a distribution of the same type as (2.4). 





UNIFORMITY FIELD TRIALS 289 


4. Numerical results. Numerical values of the coefficient in (3.4) are tabulated 
in Table 1, together with the corresponding values n (29) *"- for normal and 
(24/3) ‘"» for rectangular population. All these values approach 1 when n — 1, 
as might be expected from the fact that for any distribution Q; = 1. It is to be 
noted that the curve for the lower bound would be fairly parallel in logarithmic 
scale to the curve for rectangular population. In fact it is easily shown that when 
n becomes large the former is given by 


pe. a 2 exp |- J +y (5) — xo) |(1 + o(2)) 
(24/3)" 13 © 3 2 n 
- aya 'o( + 0G) = ayarn(t + (5), 
2x/3) 8 n (2+/3)" n 


where ¥(xz) denotes the digamma function I'’(x + 1)/I'(a + 1). The first term 
happens to be close to the true value even for small n as we see in Table 1. 


4.1) 


5. Acknowledgement. The author wishes to express his thanks to Professor 
Harold Hotelling for his kind supervision of the work. 


REFERENCE 


[1] Hersert S. Sicuet, ‘‘The method of frequency-moments and its application to type 
VII populations,’’ Biometrika, Vol. 36 (1949), pp. 404-425. 


———— 


UNIFORMITY FIELD TRIALS WHEN DIFFERENCES IN FERTILITY 
LEVELS OF SUBPLOTS ARE NOT INCLUDED IN 
EXPERIMENTAL ERROR 


By G. A. BAKER 
University of California, Davis 
1. Introduction. The present note is confined to the consideration of two 


randomized blocks with two subplots each. The usual mathematical model for 
the analysis of variance of such an experiment assumes that 


(1.1) vj = gtd +t; + «;, 


where v;; is the yield of the jth variety in the ith block, and the block effect b; 
is the average for the subplots of the 7th block. Any difference between 6; and 
the yield of subplots due to differences in fertility is one component of the random 
parts, ¢;;. The random parts, e;;’s, are then assumed to be normally and in- 
dependently distributed with zero means and uniform variance. That these as- 
sumptions may break down in many cases because of the magnitude and non- 
randomness of the differences between subplots has been indicated in a recent 
paper [1]. It should be understood that it is practically impossible with our present 
knowledge to determine the relative or absolute fertility levels of any set of plots, 





290 G. A. BAKER 


so that the present discussion will add only to the background knowledge and 
general understanding of the behavior of field trials. It is possible to discuss 
the present simple case in some detail while the situation becomes more complex 
with more degrees of freedom. (See [1], pages 64 and 65.) The effect of randomiza- 
tion is considered, and it is found to have a “‘beneficial’’ effect in some cases, and 
no effect in others. 


2. Theoretical development. The following development follows closely sug- 
gestions made by a referee, especially with respect to randomization. Let us 
consider the following setup: 


Block 1 Block 2 
Vn Vi1 


Vi2 Ure 


The v,;’s refer to observed yields of a uniformity trial. The subscript 7 refers 
to the block number and j to a dummy variety. Since this is a uniformity trial 
the ¢,;’s of equation (1.1) are zero. The plots we shall consider as being assigned 
at random to the dummy varieties. 

Let o3 be the assumed uniform error variance and £(i = 1, 2;h = 1, 2) be 
the “true” unknown fertility level in the Ath subplot of the 7th block. Let 


, 
(2.1) Vij = Leg + Ei; 
Then it is assumed that the z;;’s are distributed as N(0, oo). The jth “‘variety”’ 
has equal chance of being assigned either to the first or to the second plot within 


the block. Thus £;; itself is a stochastic variate, with probability 1/2 of taking 
the values & , £2 . Put 


2. ¢ c. . 
2.2) i, = ~ a; = . (¢ = 1, 2). 


Since if variety 1 is assigned to a given plot in the 7th block then variety 2 must 
be assigned to the second plot we have 


m=bht+ran+ a, v1 = bo + 21 + a2, 
(2.3) 


Vp = by + Z22 —~ Gi; v2 = bs + ton — Ge; 


where a, is a stochastic variate which takes the values +d; with equal probabili- 
ties 1/2. 
If we apply the conventional analysis of variance we obtain 


(2.4) S = }(un + vn — v2 — vm)” 


’ 


2 2 
(2.5) S. = (en — va — v2 + v2), 
2: . v2: 
where S; is the variety sum of squares and S; is the error sum of squares, each 
with one degree of freedom. These expressions (2.4) and (2.5) in terms of 2,;’s 
and a,’s are 





UNIFORMITY FIELD TRIALS 


/ ‘ 2 
iru + 21 — Xe — 2 + 2a, + 2az)’, 


2 


}(ru — In — Lie + Tee + 2a, — 2a)”. 


Yu + in — Le — Ta, 
Zu — In — In + 72, 
2a; + 2a2, 

= 2a; oo 2a: , 


and we get 
(2.9) F, = (483), 
u+ my, 
where u and v are independent variates distributed N(0, 403). Pur 
m, = b1 — fe — fa + fe, 
mz = fy — bie + bn — be, 


and then the pair (m;, m}) has the four possible values (m2, m:), (m:, ms), 
(—mz, —m), (—m, , —mz2), each with probability 1/4. 

If we used a systematic arrangement with the variety number the same as 
subplot number instead of a randomized arrangement we would have 


9 a v + =) 
can p= (oem 


instead of (2.9). If we apply the result given in [2], especially equation (17) page 
5, and transform to a new variable, we obtain the distribution of F as 


2.10) 


/ a +\— — 2 2 
f(F, mz, m;) an 4 4. F) 1 Fj be }(m2 +m?) /@ 


ome + a(2r) 1 + F)'F exp [—4(m F! — m:)*/(o'(1 + F))] N(x) dx 
(2.12) 0 


b 
+ b(2r *)\(1 + F)'F ‘exp [—3(ms + mF?)?/(e(1 + Fy [ N(zx) dz, 
0 


0 < F Ss ~, where 
a = (mF* + m)/(o(1 + F)'), 
b = (m — mF*)/(o(1 + F)'), 
N(x) = (20) te. 


We note that if both m,; and mz are zero (2.12) reduces to the tabled F distribu- 
tion which is used almost universally in testing the significance of “variety” 
difference with 1 and 1 degrees of freedom. 





G. A. BAKER 
TABLE 1 
Distributions (2.12) and (2.13) for seven pairs of values of the parameters 


DESIGNATING NUMBER OF PARAMETRIC PAIR 


4(5+ 6)” 


42.374 
-902 
-109 
-483 
-633 
-010 
567 
285 
-188 
-137 
109 
.091 
-078 
-069 
-062 
-056 
-052 

0.048 
0.044 
0.041 
0.038 
0.036 
0.034 
0.032 
0.030 
0.029 
0.027 
0.026 
0.025 
0.024 
0.023 
0. (2)04 
0. (6)44 


80 
-00 
-20 
-40 
-60 
-80 
.00 
-20 
40 
60 
80 
-00 
100 
10, 000 


0 
0. 
0.08 
0. 
0. 
0. 
0. 
0. 
1. 
1. 
8. 
1. 
1. 
2. 
2 
2. 
en 
2 
3 
3 
3 
3 
3 
4 
4 
4 
4 
4 
5 


Limiting 
ratio® 3. 1.462 0.606 1.034 


ccna TALL ALLL LAS 


* Distributions headed 2, 3, 5, and 6 are not randomized (2.12). 

> Distributions (2 + 3)/2, (5 + 6)/2 are randomized (2.13). 

© Distributions 1, 4, and 7 are identical under randomization and nonrandomization. 

4 Number in parenthesis indicates the number of omitted zeros, thus 0.(2)03 means 0.0003. 


© Limiting ratio of the ordinates of the indicated distributions to the ordinates of the conventional F distribution 
as F approaches zero and as F approaches infinity. 


The distribution of F when randomization is allowed is 
(2.13) S(f(F, me, m) + f(F, m,, me), 
since f(F, m2,m,) = f(F, —m2, —m). 
3. Discussion. To show how (2.12) and (2.13) may differ from the usually 


assumed distribution we have considered the following seven pairs of values of 
m,/o and m/c: 





MACNEISH THEOREM 


DESIGNATING 
NUMBER 





aoc WO hb = 


NonNnorr © © 


“J 


Selected ordinates for systematic and randomized procedures for these 7 pairs 
of values are presented and compared in Table 1. It is seen that the tails of some 
of the curves are much heavier than for case 1 (m; = m, = 0), indicating that 
much larger values of F are required for significance. On the other hand, some 
of the tails are lighter than for case 1 so that smaller F-values are indicative of 
significance at the usual levels. Randomization is effective in some cases in giving 
a distribution that is closer to the conventional F distribution than is the F 
distribution for a systematic procedure. 

It is easy to find the limiting values of the ratios of the ordinates of (2.12) 
and (2.13) to the ordinates of the conventional F distribution as F approaches 
0 and « (same). These limiting values are also indicated in Table 1. 

When (2.13) is a greatly curtailed distribution making errors of the first kind 
less probable than expected then the probability of errors of the second kind may 
be greatly enhanced. 


REFERENCES 
{1] G. A. Baker Anp F. N. Briaes, “Yield trials with backcross derived lines of wheat,” 
Annals of Inst. of Stat. Math., Tokyo, Vol. 2 (1950), pp. 61-67. 
(2) G. A. Baker, “Distribution of the means divided by the standard deviations of samples 
from nonhomogeneous populations,’’ Annals of Math. Stat., Vol. 3 (1932), pp. 1-9. 


a 
A GENERALIZATION OF A THEOREM DUE TO MacNEISH' 


By K. A. BusH 


Champlain College, State University of New York, and University of North Carolina 


1. Summary and introduction. In 1922 MacNeish [1] considered the problem 
of orthogonal Latin squares and showed that if the number s is written in stand- 
ard form: 


$= po’pi' see ’, 
1 This note is a revision of one section of the author’s doctoral dissertation submitted to 
the University of North Carolina at Chapel Hill. 





: 
| 
) 
| 


ss NN ee 


294 K. A. BUSH 


where po, Pi, *** , Pe are primes, and if 


r = min(po®, pi', --* , pe), 


then we can construct r — 1 orthogonal Latin squares of side s. An alternative 
proof was also given by Mann [2]. At the April, 1950 meeting of the Institute 
of Mathematical Statistics at Chapel Hill, North Carolina, R. C. Bose announced 
an interesting generalization of this result [3] which is stated as a theorem in the 
next section. The proof given here is simpler than Bose’s original proof and is 
published at his suggestion. 


2. Bose’s generalization of MacNeish’s theorem. Let us consider a matrix 
A = (a;;), where each a,, represents one of the integers 0, 1, --- ,s — 1 with V 
columns and k rows. Consider all t-rowed submatrices of N columns which can 
be formed from this array by choosing any ¢ rows. Each column of the submatrix 
‘an be regarded as an ordered ¢-plet. The matrix A will be called an orthogonal 
array (NV, k, s, t) of size N, k constraints, s levels, strength ¢ and index A if each 
of the C, t-rowed submatrices that may be formed from A contains every one 
of the s' possible ordered f-plets each repeated \ times. It is clear that we cannot 
add rows indefinitely to the array and still preserve its orthogonal character. 
We shall use the symbol f(.V, s, 4) to denote the maximum number of constraints 
that are possible. 

Tueroreo. If N; is divisible by si fori = 1,2, --- , u, then 


/ 


S(NIN2 «++ Nu, 8182 +++ Su, 0) 2 min (ky, ke, +++ , ku), 
where k; = f(N;, 8, t). 


Proor. Let NV; = \,s;. We shall proceed inductively, and we first establish 
the relationship: 


> 


S(NiNe , 8182, t) 2 min (hk, , ke). 


Let us denote the orthogonal array with N, columns and k; constraints by 
A = (a;;) and the second array with N2 columns and k» constraints by B = (b;;). 
We may regard the elements of these two arrays as elements of two additive 
Abelian groups. Accordingly we may form the direct sum of these two groups. 
There are s,s. elements in this sum, and we may represent any element of this 
new group by the symbol (a,; , ba.) where a,; and b,,, are elements of the two 
modules. We now write the array with N,N» columns in the form 


(Gia bur) ++ (Gen, 5 Dir) ++ (Gia, Denn) + °° (Gey, Ding) 


(Qn, Ou) +++ (Qin, , bu) +++ (an, by vg) °** (Qin, ; bins), 


where the elements of A are used for the first component for the first NV, columns 
and for the first k rows, where k = min (k,, kz). The construction is completed 
in a similar manner for the next group of N; columns (not indicated in the array 


above) and so on until NV, groups of NV, columns have been written down so that 





DISTRIBUTION OF EXCEEDANCES 295 


N = N,N:2. On the other hand, the second component is taken direct’ yfrom 
the array B = (b,;). 

Now select any ¢ rows from the array so constructed. Any ¢-plet of the b ele- 
ments is repeated N» times in each of \2 groups. Within each of these groups of 
N, objects any particular ¢-plet of the a elements occurs \,; times so that each 
t-plet which is constructed from the compound elements occurs \,A: times. Thus 
the new array is orthogonal. 

We now adjoin the array (N; , k3 , 8; , 1), where k = min (ky, ke, kz), to the one 
we have just constructed, by an analogous process. Continuing in this manner, 
we reach our theorem. In particular if ¢ = 2, and \; = 1 fori = 1, 2,---, u, 
we secure the MacNeish theorem (cf. [1]). 

As an example of the use of our theorem, we can state as an illustrative result 


(72, 6,2) 24 


since f(3°, 3, 2) = 4, f(2°, 2, 2) = 7 in accordance with results established in [4]. 
In the absence of this extension of the MacNeish result, it might have been 
supposed that there could be but three orthogonal rows for this case, since there 
are no orthogonal Latin squares of side 6. We cannot, however, conclude that 
the equality sign holds since counter examples have been given in [4]. 


REFERENCES 


{1] H. F. MacNetsu, ‘“‘Euler’s squares,’’ Annals of Mathematics, Vol. 23 (1922), pp. 221-227. 
{2} H. B. Mann, ‘“‘The construction of orthogonal Latin squares,’’ Annals of Math. Stat., 
Vol. 13 (1942), pp. 418-423. 


(3] R. C. Boss, ‘‘A note on orthogonal arrays,’’ Annals of Math. Stat., Vol. 21 (1950), pp. 
304-305 (Abstract). 


[4] R. C. Bose anv K. A. Busn, ‘‘Orthogonal arrays of strength two and three,’’ Annals of 
Math. Stat., Vol. 22 (1951), p. 312 (Abstract). 


a 


ON A LIMITING CASE FOR THE DISTRIBUTION OF EXCEEDANCES, 
WITH AN APPLICATION TO LIFE-TESTING 


By Ler B. Harris 


General Electric Company 


According to equation (4.12) of [1], the probability that in a future sample of 
N observations, taken from an unknown distribution of a continuous variate, 
less than zx of them will exceed z,, , the mth highest observation in the trial sample 
of n observations, is given by 


(41) 
W(n,m, N, xz) = 1 —- e+! F,(a + 1, -—-n, -—-n —-N+2+41,1)), 


(= + ") 
xr+ 1 





) 
| 
| 
) 
: 
i 


296 LEE B. HARRIS 


where F,,, is the sum of the first m terms of the hypergeometric series, having the 
parameters indicated in the parentheses. If we set m = 1, we find that the prob- 


ability of getting in a future sample of N trials at most x exceedances of the largest 
value in a trial sample of n observations is 


; oar N N+n 
(1) W(n,1,N,2) = 1 I(« + W/C + ') | 


since F; = 1. 


If x and N are both large, we can approximate the factorials in (1) with Stirl- 
ing’s formula, a! + +/2a(a/e)*. Then (1) reduces to 


1— W(n,1,N,2z) & (1 _ a ry 


( n es 
“7 N= e= i) 


o< L iataietiaeenaueiaas pes ‘| y/ 
| 1 : 
pe ea 
Now consider the limiting case in which N and z both approach infinity in such 
a way that x = kN. This is the case in which we wish to find the probability that 
in a very large future sample at most a fraction k of the observations will exceed 


the largest value in the trial sample of n observations. Considering each of the 
factors on the right side of (2), we have 


+1 
l ae ) = (1 — k)’, 
attr N ' 


esr n N+n 
ies ( 85-1 —— (3 x) 


r=kN—-2 


s n . 
lim 4/1—-— —— = lim 
r=akN +0 { N ao rn rakN +20 
Hence, 


(3) lim W(n,1,N,kN) = 1—-— (1 — k)”. 


Nm 


The probability density, which may be obtained from (3) by differentiation, 
is 
(4) p(k) = n(l — ky”. 

An interesting check on the consistency of the theory is a proof of (4) based 
on Gumbel’s original discrete distribution of x. Setting m = 1 in equation (1.3) 


of [1] we have i 
bob 
| ‘ 
w(n,1,N, 2) = ~ : 


wee Sts _ Di 
x | 





DIS 


4 | | 
60 | | k/ 


EAH LL Y fp 


aM ZZ 


TRIBUTION OF EXCEEDANCES 
™ Pe 
Diossee 


iT Le 


AI 
FAIA eA 


eed 


MYAAA | TTT 


SZ ye 
WZ 
VY | ‘ a ia” 


Y 




















298 R. F. LINK 


Note that the factor in brackets is the same as the last term on the right side of 
(1) with n replaced by n — 1 and x + 1 replaced by x. Hence for large x and N, 


] n—-1 [i a-l 
( n— , ~N+n-— iV i+ WV —z 
71+ 7 


By the same limiting procedure as before, 


(5) lim w(n,1,N,kN) = “(1 —k)™". 


zak N 0 N 


In any small interval dk, there are Ndk possible values that x can assume; hence 
the probability that & lies in the interval dk is 


(6) p(k) dk = 7 (1 — k)""(N dk). 


Therefore, p(k) = n(1 — k)"”*. This is exactly the result given by equation (4), 
but obtained in a somewhat different way. 

From the symmetry of the problem, lim y.... W(n, 1, V, kN) is also the probabil- 
ity that in a large future sample at mest a fraction k of the observations will be 
less than the smallest observation in the original trial sample of n units. Hence, 
a life-test of n units may be discontinued as soon as any unit fails and equation 
(3) will give the probability that in the future at most 100k% of the units will 
fail in a time shorter than the length of the test. The graphs show W as a func- 
tion of k for various values of n. 


REFERENCE 


{1] E. J. GumBet anp H. Von Scue uno, “‘The distribution of the number of exceedances,”’ 
Annals of Math. Stat., Vol. 21 (1950), pp. 247-262. 


(a 


CORRECTION TO “THE SAMPLING DISTRIBUTION OF THE RATIO 
OF TWO RANGES FROM INDEPENDENT SAMPLES” 


By Ricuarp F. Link 
Princeton University 


In the note mentioned in the title (Annals of Math. Stat., Vol. 21 (1950), 
pp. 112-116) the distribution given for the above mentioned ratio when the 
sample values are drawn from a rectangular distribution is correct only when 
R = 1. This is pointed out in an article by P. R. Rider (“The distribution of the 





ABSTRACTS 299 


quotient of ranges in samples from a rectangular population,’’ Jour. Am. Stat. 
Assn., Vol. 46 (1951), pp. 502-507), who also gives the correct density of the 
ratio for R = 1. The correct cumulative distribution for R = 1 is 


1—-R”™ Rnon(n, — 1) n(n; — 1)(m — 1) \ 


\im +m — 1)(m +m — 2) - (m + ne) (my +m — 1)J° 


a  R 


ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the Blacksburg meeting of the Institute, March 19-21, 1952) 


1. On the Approximation of Sampling Distributions by Punch Card Methods. 
Cart F. Kossack anp Lester L. Hextms, Purdue University. 


This paper presents a procedure for obtaining empirical distributions, by punch card 
methods, of statistics for which the exact distribution or a usable approximation has not 
been found. The mechanization of random sampling of a univariate population has been 
described and extended to random sampling of a correlated multivariate population whose 
covariance matrix is given. This procedure has been applied to Wald’s classification statistic 
in the univariate case, and the results noted. 


2. Resolvable Incomplete Block Designs with Two Replications. R. C. Bose 
AND K. R. Narr, University of North Carolina. 


Incomplete block designs in which the blocks can be grouped in such a way that each 
group contains a complete replication may be called resolvable designs. They are useful 
from the point of view of recovery of inter-block information. ft is therefore important to 
investigate resolvable designs involving a few replications. In this paper we consider a 
class of resolvable designs with two replications, which contains as a special case the well 
known square and rectangular lattices with two replications. Given a symmetrical balanced 
incomplete block design with u treatments, and r replications in which each pair occurs A 
times, we can use the incidence matrix (n;;) of this design to form a design of one class in 
the following way. Take a u X u square scheme, and in the cell (i, 7) put 2 new treatments 
when n;; = 1, and y new treatments when n,;; = 0. The total number of treatments ob- 
tained in this way is v = ujrx + (u — r)y]. The design is now constructed by taking the 
rows of the scheme for the blocks of the first replication,/and the columns of the scheme 
for the blocks of the second replication. It has been shown that both the intra- and inter- 
block analysis can be carried out in a simple manner. The necessary formulae have been 
given, and the computational procedure illustrated by working out a numerical example. 


3. Rank Analysis of Incomplete Block Designs. I. The Method of Paired 
Comparisons. R. A. BrapLtey anp M. E. Terry, Virginia Polytechnic 
Institute. 

True preferences or ratings mu, °°: , mu, D;i; tin = 1, are assumed to exist for ¢ 
treatments in the uth of g groups of experimental data in an experiment involving paired 
comparisons. For the uth group, the probability that treatment 7 is ‘‘better’’ than treatment 
j when they appear in a pair is postulated to be miu/(win + ju) 

Three tests of hypotheses are available and estimates of the treatment ratings may be 





300 ABSTRACTS 


obtained. The tests use likelihood ratio statistics to test (a) Ho : ri, = 1/t, against HM; : 
win = mw; for all u; (b) Ho : wi, =1/t, against MH, : ri, 4 1/t; and (c) Ho : miu = 7; for all 
u, against H, : rig # 7m. 

Small-sample distributions with tables are available for tests (a) and (b). In all three 
tests limiting distributions are shown to be in the form of chi-square. 


4. Multiple Regression with a Quantal Response. D. B. DuNncAN anp R. C. 
tHODES, Virginia Polytechnic Institute. 


The problem considered is that of fitting a maximum likelihood multiple regression 
equation to data in which the response is quantal, the probit transformation is appropriate 
and the number, r, of independent regression variates is not small. 

Iterative methods, for example the Bliss-Fisher method, are available, but these have 
been developed mainly for the case r = 1 and rapidly become impractical for cases r > 2. 

A method is developed based on (i) the approximation of the weighted deviations of the 
working probits from the provisional probits by linear functions of the provisional probits 
and (ii) the replacement of the independent z variates throughout most of the procedure 
by a linear function of them, termed a composite regression variate. These devices lead to 
a simple procedure and result in an estimated 70 to 90% saving in work. 


5. Rank Analysis of Incomplete Block Designs. II. The Method for Blocks of 
Three. (Preliminary Report.) R. A. BrapLEy anp M. E. Terry, Virginia 
Polytechnic Institute. 


The extensions of ‘‘Rank analysis of incomplete block designs. I. The method of paired 
comparisons,’’ Abstract No. 3 above, to blocks of size three are presented. As before, true 
preferences or ratings mu, -°** , tu, 2:4; tiv = 1 are assumed to exist for ¢ treatments 
in the uth of g groups. For the uth group the probability that treatment i obtains top rank- 
ing in the presence of treatments j and k is mju/(aiu + wju + teu) and the probability that 
treatment j obtains rank 2, given that ¢ had rank 1, is rju/(4ju + mku). 

The three test of hypotheses listed in the first paper are again developed. Tables are 
under preparation but are not yet available or complete. 


6. Limit Theorems Associated with Variants of the von Mises Statistic. M. 
RosENBLATT, University of Chicago. 


A multidimensional analogue of the von Mises statistic is considered for the case of 
sampling from a multidimensional uniform distribution. The limiting distribution of the 
statistic is shown to be that of a weighted sum of independent chi-square random variables 
with one degree of freedom. The weights are the eigenvalues of a positive definite symmetric 
function. A modified statistic of the von Mises type useful in setting up a two-sample test 
is shown to have the same limiting distribution under the null hypothesis (both samples 
come from the same population with a continuous distribution function) as that of the 
one-dimensional von Mises statistic. The paper makes use of elements of the theory of 
stochastic processes. 


7. A Modification of Schwarz’s Inequality with Applications to Distributions. 
Sigeit1 Morieuti, University of North Carolina and University of Tokyo. 


Let @(t) be a function of bounded variation in the closed interval [a, b] and continuous 
at both ends. Then for any nondecreasing function z(t) belonging to L2(a, b), and summable 





% 


ABSTRACTS 301 


> b if ro \a ” 
with respect to , | z(t) d@(t) s (| x(t)? at} i $(t)? dt \ , where ¢(¢) is the right-hand 
a 


a j a 
derivative of the “‘greatest convex minorant”’ of @(t). This is proved and necessary and 
sufficient conditions for the equality to hold are also given. Several examples of application 
to distribution problems in statistics are discussed. 


8. Confidence Intervals of Fixed Geometric Size for Scale Parameters. (Pre- 
liminary Report.) Lionet Weiss, University of Virginia. 


A procedure is given for obtaining confidence intervals for parameters of scale with 
confidence coefficient no less than 8 and length no greater than A, where 8 is any number 
between 0 and 1 and A is any positive number. The procedure uses two samples, the size 
of the second sample being a chance variable. It seems certain that there are other pro- 
cedures for the same purpose yielding a smaller expected number of observations, but 
even in using the method given the problem of fixing the size of the first sample to mini- 
mize the expected number of observations is tedious computationally. A comparison is 
suggested between the expected number of observations and the number of observations 
required when an upper bound for the scale parameter is known and a single sample is 


used to get a confidence interval of at least a given confidence coefficient and of length 
bounded by a given number. 


9. On Lower Bounds of Powers of Certain Multivariate Tests. S. N. Roy, 
University of North Carolina. 


For multivariate normal populations tests of hypotheses were earlier offered for (i) 
equality of two covariance matrices; (ii) independence of two set of variates, and (iii) the 
analysis of variance situation. Lower bounds of the powers of such tests are now discussed. 
Here, for simplicity, under (iii) is considered the hypothesis of equality of respective 
means for k p-variate populations with a common covariance matrix 2. . Let S; denote 
the ‘‘covariance matrix of the sample means,’’ S; the ‘‘pooled covariance matrix of sample 
error,’”’ =; the corresponding population matrix of means, H, the hypothesis (iii), and H an 
alternative. Then the critical region of the test at a level a is: 06, 2 0, where 4 is given by 
P(0, = % | Ho) = a and @, is the largest characteristic root of the matrix S,S3' (positive 
semidefinite of rank q = min (p, k — 1), a.e.). For the power we have the following lower 
bound: 


g 
P(6, > %|H) > 1 — Tf {1 — P (moncentral F > &|0,)}, 


t—1 


the noncentral F being with d.f. (k — 1) and (N — k) (N: total number of observations), 
and @,’s being the characteristic roots of the matrix =,27%' (positive semidefinite of rank, 
say, s S q). Similar lower bounds are also readily available for (i) and (ii). 


10. Normal Multivariate Analysis and the Orthogonal Group. A. T. James, 
Princeton University. 


The relationship of the orthogonal group, and its two coset spaces, the Grassmann and 
Stiefel manifolds, to normal multivariate sampling theory is discussed. The use of the 
Blaschke differential forms to represent the invariant measures on the two manifolds is 
illustrated by a derivation of the well known distribution of the canonical correlation co- 
efficients in the null case. The distribution of n independent samples from a normal k-variate 
population is transformed into 3 independent distributions, viz., (a) essentially the Wishart 
distribution; (b) the distribution of the linear subspace spanned by the sample when 


es ae ea ee 


5° NR roe. tax aK dinate 





ed 


| 
| 
; 
| 


POR ee ne ee ER 


302 ABSTRACTS 


represented as k vectors in n-space; this is given by the invariant measure in the Grass- 
mann manifold; (c) the invariant distribution of a k X k orthogonal matrix which 


determines the orientation of the k vectors in the k-dimensional linear subspace. 


11. Exact Formulae in Sequential Analysis for Exponential Distributions. JoHan 
H. B. KempermMan, Purdue University. 


Let a > O and b > 0. Let X, , X2, --+ be a sequence of independent random variables 
with a common distribution (we assume Pr (X; # 0) > 0). Put Z, = X, + X.+ --- X, 
and let V be the random variable which takes the value nif —a < Z, < b(k =1,--- ,n — 1) 
and Z, 2 »b or Z, S —a. We put p, = Pr(N = n), pn = Pr(N =n, Z, S —a) andq, = 
Pr (N = n, Z, 2 b). Let D be an open connected region in the complex z-plane containing 
an interval G on the imaginary axis. We suppose that there exists a function y(t) which is 
analytic in D and which in G takes the value ¢(t) = E(e'X). Then, the function which for 
t in G is defined by r,(t) = p,E(e'%, | N = n) can be extended to an analytic function r, (t) 
in D. Moreover, there exists a constant 6(0 < @ < 1) such that for each value ¢ in D with 

o(t) | = 6we have Lfr,(t) (t)-" = 1 (Wald’s fundamental identity). For the same values 
t, this relation may be differentiated term by term with respect to t. This generalization is 
used to obtain generating functions for p, and g, under certain conditions. 


12. A Note on a Generalized Behrens-Fisher Problem. Henry ScHerrsé, Colum- 
bia University. 


An exact solution [Henry Scuerrf®, ‘‘On solutions of the Behrens-Fisher problem, based 
on the ¢t-distribution,’’ Annals of Math. Stat., Vol. 14 (1943), pp. 35-44; ‘“‘A note on the 
Behrens-Fisher problem,’’ Annals of Math. Stat., Vol. 15 (1944), pp. 430-432] of the Behrens- 


Fisher problem, based on the ¢t-distribution, is generalized to yield confidence intervals 
for a linear combination of unknown parameters. 


13. Large-Sample Confidence Intervals for Density Function Values at Per- 
centage Points. Joun E. Wausn, China Lake, California. 


Let us consider a sample of size n from a population with density function f(z). Let 6, 
represent the 100p% point of this population. A class of ‘‘well behaved’’ density functions 
is defined. This class seems to contain density functions which are capable of approximating 
most practical situations of a continuous type for .05 S p S .95. This paper presents some 
approximate confidence intervals for f(@,) for the case where .05 S p S .95 and the density 
function is of the ‘‘well behaved”’ class. These results hold for values of n which are only 
moderately large. The exact value of a confidence coefficient is not known but is determined 
within reasonably close limits. An approximate expression is obtained for deciding when n 
is sufficiently large for application of these results. The minimum sample sizes required 
depend on p and the confidence coefficient; they range from around fifty to several thou- 
sand. The confidence intervals are based on statistics of the form 2{(p + «)n + CVn] — 
z[(p — e) — CVn), where z[z] = z[integer nearest z] and z{1], --- , 2{n] are the sample values 
arranged in increasing order of magnitude. The quantity ¢« is a small but fixed number 
depending on p, while C is chosen so that a confidence interval of the desired order of 
magnitude is obtained. 


14. Sequential Sufficient Statistics. R. R. Banapwur, Delhi, India. 


The author defines sequential sufficiency and gives some characterizations of it. Let 


Z,, 22, °°+ be a sequence of abstract chance variables having a joint distribution p be- 





ABSTRACTS 303 


longing to a family P of probability distributions. For each m let Xm) be the space of all 
points (2, , 22, + , 2m), and let t, be a function on X() with arbitrary range such that 
tm is a sufficient statistic for P when the sample space is Xm). Then (t; , f2, «--) is said 
to be a sequential sufficient statistic if for any event A depending only on 2 , 22 , --- and 
zm the conditional probability of A given t,4, equals the conditional expectation given 
tm+1 Of the conditional probability of A given tm , (m = 1, 2, ---). The role of sequential 
sufficient statistics in sequential decision problems has been described elsewhere [RaGHu 
Ras Banapur, “On sufficiency and statistical decision functions,’ Annals of Math. Stat., 
Vol. 22 (1951), pp. 609-610 (abstract)]. The main result established here is the following. If 
<i ,22,--- and 2, are independently distributed and their joint distribution is absolutely 
continuous with respect to a fixed o-finite measure \», (p € P; m = 1, 2, ---), 
then (f, , f2 , ---) is a sequential sufficient statistic. 


15. Some Powerful Rank Order Tests. Wassitty HorerrpinG, University of 
North Carolina. 


It is shown that in certain cases there exist nonparametric tests which depend only on 
the ranks of the observations and whose power is arbitrarily close to the power of a standard 
parametric test if the sample is sufficiently large. For example, let (2 , yi), --* , (Zn 5 Yn) 
be a random sample from a continuous bivariate distribution. Let H be the hypothesis 
that z and y are independent. Let r; and s; be the respective ranks of z; and y; . Let h,(k) 
be the expected value of the kth order statistic in a sample of n observations from a normal 
(0, 1) distribution. Let c, = 2,2 ha(ri)ha(si). Let k, be the smallest number for which 
the probability of | c, | > k, does not exceed a when H is true. Suppose that (z, y) has a 
bivariate normal distribution with correlation p (which may depend on n), and that the 
power of the standard product-moment correlation test of size a tends to a constant 8 S 1 
as n — «©. Then the power of the test which rejects H if | c, | > k, tends to the same limit 
8. Similar results hold for two-sample tests, analysis of variance tests, etc. (Work sponsored 
by the Office of Naval Research.) 


16. Confidence Bounds for a Set of Means. D. A. S. Fraser, University of 
Toronto. 


The following problem was suggested to the author by Professor John Tukey: given 
Zi,°** , 2, are normal and independent with means wm , --- , #, and variance o?, to find an 
upper confidence bound (or confidence interval) for the set of means uw, --- , un. This 
paper proves that, subject to mild restrictions on the type of bound, exact 8-level confidence 
bounds (or intervals) do not exist (unless n = 1 or 8 = 0, 1). Incidental to the proof, bounds 
are obtained having at least 8 confidence: they are max z; + \,_s¢ for the upper bound and 
(min 2; + Ay-4a~8)¢, Max Z; + Ayu-s)o) for the interval, where Aq is the value exceeded with 
probability a by a standardized normal variate. 

If the w’s are values of the location parameter for a distribution with density fu(z) = 
f(x — mw), then a bound (interval) with at least 8 confidence is obtained by using the above 
formulas with o = 1 and with @ defined as the @ point of the distribution having u = 0. 
If this class of distributions is bounded complete with respect to the location parameter 
uw (using at least all uw less than, say, zero), then exact upper bounds to not exist. 





NEWS AND NOTICES 


NEWS AND NOTICES 


Readers are invited to submit to the secretary of the Institute news items of interest 


Personal Items 


Dr. Carl B. Allendoerfer, formerly Professor of Mathematics at Haverford 
College, Pennsylvania, has accepted an appointment as Professor and Executive 
Officer of the Mathematics Department, University of Washington, Seattle. 

Mr. Ishver 8. Bangdiwala, who came from India in 1950 to the University of 
North Carolina to do graduate work in statistics, has now been appointed as 
Assistant Consulting Statistician at the Agricultural Experiment Station, Uni- 
versity of Puerto Rico, Rio Piedras, Puerto Rico. 

Dr. Archie Blake has accepted a position as Head of the IBM Computing 
Section, Cornell Aeronautical Laboratory, Buffalo, New York. 

Mr. K. A. Brownlee, formerly Chief, Test Design Branch, Plans and Evalua- 
tion Office, Dugway Proving Ground, Tooele, Utah, has joined the staff of the 
Committee on Statistics, University of Chicago, as an Assistant Professor. Mr. 
Brownlee’s work will be principally in the Committee’s Statistical Research 
Center. 

Dr. Enrique Cansado, Professor at the University of Madrid and Chief of the 
Methodological Section of the Instituto Nacional de Estadistica, Spain, has ac- 
cepted a visiting assistant professorship in the Department of Mathematics, 
University of California, Los Angeles, to teach Stochastic Processes and Calculus 
of Probability for the academic year 1951-1952. During the previous academic 
year he held a Del Amo Foundation fellowship for research and studies in the 
U.S.A. 

Dr. W. R. Church, Professor of Mathematics at the United States Naval 
Postgraduate School, has moved from Annapolis to Monterey, California. The 
Postgraduate School, which was founded at Annapolis and was a department 
of the U. S. Naval Academy, has been moved to Monterey. 

Dr. C. W. Cotterman, formerly Associate Geneticist at the Heredity Clinic, 
University of Michigan, has joined the staff of the School.of Veterinary Medicine, 
University of California, Davis. 

Dr. Edgar P. King, who received his doctor’s degree in mathematics in June, 
1951, from the Carnegie Institute of Technology, is now employed as a mathe- 
matical statistician in the Statistical Engineering Laboratory, National Bureau 
of Standards, Washington D. C. 

Dr. Kenneth H. Kramer, who received his doctor’s degree in mathematics in 
February, 1952, from the Carnegie Institute of Technology, is now employed by 
the Youngstown Sheet and Tube Company, Youngstown, Ohio, as a develop- 
ment engineer on the staff of the Director of Mill Research and Development. 
In this job, he will serve in a consulting capacity on general statistical applica- 
tions to quality control, production control and cost control. 

Dr. R. A. Leibler of the Armed Forces Security Agency has accepted a posi- 
tion as mathematician at the Sandia Corporation, Albuquerque, New Mexico. 





NEWS AND NOTICES 305 


Dr. Ardie Lubin has left his position as Lecturer in Statistical Psychology at 
the Institute of Psychiatry, University of London, and is now a Research Psy- 
chologist with the Personnel Research Section, Adjutant-General’s Office, U. 8. 
Department of the Army. 

Mr. John W. Morse, formerly Chief, Epidemologic Studies Section, Venereal 
Disease Division, Federal Security Building, Washington, D. C., has been trans- 
ferred to Chile as technical adviser in the field of health and vital statistics. The 
work is being sponsored by the United States TCA Point IV Program. 

Mr. James A. Pierce, who has been doing graduate work in mathematics and 
acting as Graduate Assistant at Purdue University, is now employed by Con- 
solidated Vultee Aircraft Corporation as an Aerophysics Engineer in an Arma- 
ment Evaluation and Vulnerability Section of the Technical Design Department. 

Dr. William J. Schull has accepted a position as Junior Geneticist at the Hered- 
ity Clinic, University of Michigan. He has spent the past two years in Japan as 
a statistical consultant with the Atomic Bomb Casualty Commission studying 
the effects of atomic radiation. 


Dwarka Nath Nanda 


Dwarka Nath Nanda died in Delhi, India, on March 9, 1952. He was the 
author of three contributions to multivariate statistical analysis, published in the 
Annals of Mathematical Statistics in 1948 and 1950. 

Dr. Nanda was born on April 11, 1916, received the B.Sc. and M.A. degrees in 
Mathematics from the University of Agra, served as Statistician in the Punjab 
Agricultural Department for five and a half years and in the Imperial Council 
of Agricultural Research for two years and became Director of Statistics for the 
state of Mayurbhanj. He studied two years at the University of North Carolina 
and received the Ph.D. degree there in Mathematical Statistics in 1948. 

On return from North Carolina he was appointed Assistant Professor of 
Statistics at the Indian Council of Agricultural Research. He relinquished this 
post in May, 1949, to take up the appointment of Senior Scientific Officer, Sta- 
tistics, at the Technical Development Establishment Laboratories, Kanpur, 
under the Ministry of Defence, Government of India. 

He is survived by his wife and three children. 


Wald Memorial Fund 


As a tribute in memory of the late Professor Abraham Wald, who was killed 
with Mrs. Wald in an airplane accident in India December 13, 1950, a group of 
friends and colleagues are establishing a fund to help in defraying the expenses of 
a college education for his two children, Betty, eight years old, and Bobby, now 
four. 

Trustees of the fund are Theodore W. Anderson, Howard Levene, and Morti- 
mer Spiegelman. Contributions may be made payable to WALD MEMORIAL 
FUND and may be sent to Howard Levene, Box 23 Fayerweather, Columbia 
University, New York 27, New York. 





i Nea boas manana ws 


Por NE 


NEWS AND NOTICES 


New Secretary-Treasurer 


Professor K. J. Arnold of the University of Wisconsin has been elected by the 
Council of the Institute of Mathematical Statistics as Secretary-Treasurer for 
the term July 1, 1952, to June 30, 1955. He has also been appointed to an associ- 
ate professorship at Michigan State College. The new business office of the 
Institute will be at the Department of Mathematics, Michigan State College, 
East Lansing, Michigan, during Professor Arnold’s term of office. Professor 
Arnold will continue to reside in Madison, Wisconsin, until the end of August, 


and will receive mail during July and August at North Hall, University of 
Wisconsin, Madison 6, Wisconsin. 


Preparation of Problem and Source Materials for the Mathematical 
Training of Social Scientists 


As readers of the Annals probably know, a Committee on the Mathematical 
Training of Social Scientists has been at work for some time. The Committee 
includes representatives from the following associations and societies: American 
Anthropological Association, American Economies Association, American Educa- 
tional Research Association, American Farm Economics Association, American 
Political Science Association, American Psychological Association, American 
Sociological Society, American Statistical Association, Econometric Society, In- 
stitute of Mathematical Statistics, Mathematical Association of America, and 
Psychometric Society. 

As the result of a suggestion from this Committee, the Social Science Re- 
search Council is now sponsoring a small group to work during the summer of 
1952 at Dartmouth College, Hanover, N. H. This group will attempt to compile 
from the literature of the various social sciences lists of problems, extracts from 
sources, and references to sources that illustrate varieties of uses of mathematics 
in the social sciences. These compilations are expected to serve a number of im- 
portant ends—e.g., to provide mathematicians with material for use in texts and 
courses designed for social scientists, to indicate the general dimensions of the 
mathematical training appropriate for students of the social sciences now and 
in the future, and to facilitate the study of mathematics by social scientists for 
whom organized courses are not available. 

This Committee believes that the group referred to would find it most help- 
ful if it could have a wide variety of suggestions from the various areas con- 
cerned. A general appeal for such suggestions is hereby made. They should be 
sent to Professor William G. Madow, Chairman, Committee on the Mathemati- 
cal Training of Social Scientists, Baker Library, Hanover, N. H., up to August 
15, and thereafter University of Illinois, Urbana, III. 


Although the Committee does not wish to limit the suggestions to specific 
types of material, it would prefer greater emphasis on materials relating to the 
use of mathematics in the social sciences themselves than on those relating to 





NEWS AND NOTICES 307 


statistics, since the materials necessary for statistics are better known. More- 
over, the Committee would suggest that those who respond not concern them- 
selves with questions of duplication of what others would say, but give as much 
information as possible. This first request for assistance is aimed at providing 
those who are interested in this subject with an opportunity to make their views 
known to the Committee in as general terms as they wish. 

Finally, the Committee would appreciate learning where programs of mathe- 
matical training intended for social scientists are now in existence or in process 
of development, and where mathematics at the level of the calculus or higher 
is required for undergraduate or graduate degrees in the social sciences or may 
be substituted for another requirement for a degree in a social science. 


Statistical Summer Session, July 29 to August 15, 1952 


The Department of Statistics and the Statistical Laboratory in cooperation 
with the Department of Mathematics and the Department of Industrial Engi- 
neering of the Virginia Polytechnic Institute will conduct a special statistie¢al 
summer session, July 29 to August 15, 1952. The program will be for graduate 
students, research workers, and technicians in government and industry. Special 
offerings will be given in the statistics of taste testing, bio-assay, sampling, and 
engineering research and production. For further details write the Department 
of Statistics, Virginia Polytechnic Institute, Blacksburg, Virginia. 


Summer Seminar in Statistics 


The third meeting of the Summer Seminar in Statistics will take place on the 
campus of the University of Connecticut, Storrs, Connecticut, during the three 
weeks of August 4-22, 1952. There will be one or two seminar sessions each day 
and a clinic on the treatment of problems in applications. 

The first week, August 4-8, which will be devoted to the modifications of sta- 
tistical techniques appropriate for chemistry, is being organized by Cuthbert 
Daniel and W. L. Gore. The second week, August 11—15, will be divided into two 
parts. The latter part, devoted to applications of minimax techniques, is being 
organized by J. L. Hodges. The third week, August 18-22, will be divided into 
two parts. The first part will be devoted to follow-up studies as they arise in 
medicine; this is being organized by Irwin Bross. The second part will be devoted 
to applications in actuarial work; this is being organized by Mortimer Spiegel- 
man. Professor R. A. Fisher will be a member of the seminar during the first two 
weeks. 

Those interested in the subjects under discussion are invited to attend by the 
day, week or other period. (A nominal registration fee will be collected.) For 
further information on reservations for campus housing, write to the Secretary 
of the Seminar, Professor D. F. Votaw, Jr., 210 Leet Oliver Memorial Hall, 
Yale University, New Haven, Connecticut. Suggestions of problems which 
might be presented before the clinic may also be sent to Professor Votaw. 


rr ie ete ee 





NEWS AND NOTICES 


Doctoral Dissertations in Statistics, 1951 


Listed below are the doctorates conferred during the year 1951 in the United 
States and Canada for which the dissertations were written on topics in statistics 
(or for a degree in statistics). The university, month in which degree was con- 
ferred, major subject, minor subject, and the title of the dissertation are given 
in each case if available. 

Helen Abbey (Doctor of Science in Hygiene), Johns Hopkins, June, major in 
biostatistics, “An Examination of the Reed-Frost Theory of Epidemics.” 

R. E. Bechhofer, Columbia, June, major in mathematical statistics, minor 
in statistical quality control, “The Effect of Preliminary Tests of Significance 
on the Size and Power of Certain Tests of Univariate Linear Hypotheses.” 

W.S. Connor, Jr., North Carolina, August, major in mathematical statistics, 
minor in economics, ‘“‘The Structure of Balanced Incomplete Block Designs 
and the Impossibility of Certain Unsymmetrical Cases.”’ 

A. M. Dutton, Iowa State College, June, major in statistics, minor in genetics, 
“Statistical Analysis of Long-Term Agricultural Experiments.”’ 

L. A. Goodman, Princeton, October, major in mathematics, ‘“‘The Estimation 
of Population Size Using Sequential Sampling Tagging Methods.” 

E. R. Immel, California, June, major in mathematics, “Problems of Estima- 


tion and of Hypothesis Testing Connected with Birth-and-Death Markov 
Processes.” 


G. B. Kallianpur, North Carolina, August, major in mathematical statistics, 


minor in mathematics, ‘‘SSome Topics in the Theory of Stochastic Processes.’ 

E. P. King, Carnegie Institute of Technology, June, major in mathematics, 
“The Operating Characteristic of the Control Chart for Sample Means when 
Process Standards Are Unspecified.” 

Kk. H. Kramer, Carnegie Institute of Technology, June, major in mathematics, 
“The Distribution of Range in Compositions of Normal Universes.” 

A.S. Littell (Doctor of Science in Public Health), Johns Hopkins, June, major 
in biostatistics, “Estimation of the T-Year Survival Rate from Follow-up Studies 
over a Limited Period of Time.” 

Paul Meier, Princeton, October, major in mathematics, ‘“‘Weighted Means 
and Lattice Designs.”’ 

R. B. Murphy, Princeton, October, major in mathematics, “On Tests for 
Outlying Observations.” 

Ingram Olkin, North Carolina, June, major in mathematical statistics, minor 
in mathematics, ‘On Distribution Problems in Multivariate Analysis.” 

D. B. Owen, University of Washington, March, major in mathematics, “‘A 
Two Sample Test Procedure.” 

M. P. Peisakoff, Princeton, October, major in mathematics, ‘Transformation 
Parameters.” 

D. D. Rippe, Michigan, June, “Statistical Rank and Sampling Variation of 
the Results of Factorization of Covariance Matrices.” 





NEWS AND NOTICES 309 


Milton Sobel, Columbia, January, major in mathematical statistics, minor in 


mathematics, “‘An Essentially Complete Class of Decision Functions for Certain 
Standard Sequential Problems.” 


W. F. Taylor, California, January, ““On Tests of Hypotheses and Best Asymp- 
totically Normal Estimates Related to Certain Biological Tests.” 

M. E. Terry, North Carolina, June, major in mathematical statistics, minor in 
experimental statistics and mathematics, “Some Rank Order Tests which are 
Most Powerful Against Specific Parametric Alternatives.” 

F. H. Tingey, University of Washington, August, major in mathematics, minor 
in meteorology, ‘Extension of Kolmogoroff Statistic to More Than One Dimen- 
sion.” 


(ne 


New Members 


The following persons have been elected to membership in the Institute 
(December 2, 1951 to March 1, 1952) 


Boretti, Lodovico, Ph.D. (Univ. of Genoa), Assistant Professor of Mathematical Statistics, 
Institute of Statistics, University of Genoa, Italy. 

Borsting, Jack R., B.A. (Oregon State College, Corvallis), Graduate Assistant, Department 
of Mathematics, University of Oregon, Eugene, Oregon. 

Bowman, John R., Ph.D. (Univ. of Pittsburgh), Head, Department of Research in Physical 
Chemistry, Mellon Institute, Pittsburgh 13, Pennsylvania. 

Bush, Kenneth A., Ph.D. (Univ. of North Carolina), Associate Professor, Department of 
Mathematics, State University of New York, Champlain College, Plattsburgh, New 
York. 

Busk, Thoger, B.Sc. (Univ. of Copenhagen), Statistician, W. H. O. Tuberculosis Research 
Office, Copenhagen, % Dr. Tvaergade 41, III, Copenhagen, K., Denmark. 

Ehrenfeld, Sylvain, A.M. (Columbia Univ.), Graduate student, Department of Mathemati- 
cal Statistics, Columbia University, 370 Columbus Avenue, New York 24, New York. 

Elkin, William F., M.S. (Univ. of Michigan), Graduate Research Assistant, Department 
of Biostatistics, School of Public Health, University of North Carolina, Chapel Hill, 
North Carolina. 

Gomberg, Louis, B.A. (New York University), Graduate Student, Department of Mathe- 
matical Statistics, Columbia University, 2120 Mapes Avenue, New York 60, New York. 

Green, William K., M.A. (Univ. of Illinois), Analyst, Department of Defense, Armed 
Forces Security Agency, Washington, D. C., Apartment 102, 6616 Willston Place, 
Falls Church, Virginia. 

Hagan, John S., M.S. (St. Louis Univ.), Graduate student, Department of Mathematical 
Statistics, Columbia University, New York 27, New York, 4426 Broadway, Kansas 
City 2, Missouri. 

Heimlich, C. Roger, Student, Laboratory Instructor for Machine Methods, Purdue Uni- 
versity, 726 N. Chauncey Avenue, West Lafayette, Indiana. 

Ito, Koichi, B.E. (Univ. of Tokyo), Graduate student, Department of Mathematics, Saint 
Louis University, *83 Vandeventer Place, St. Louis 8, Missouri. 

Jones, Wayne H., M.S. (Univ. of Chicago), Analytical Statistician, Personnel Research 
Section, Personnel Research and Procedures Branch, Adjutant General’s Office, 
Department of the Army, Washington, D. C., 915 Cofer Road, Falls Church, Virginia. 


Se ere ee ee ee ee 





310 NEWS AND NOTICES 


Kozelka, Robert M., M.A. (Univ. of Minnesota), Graduate student, Department of Mathe- 
matics, Harvard University, 16 Usher Road, W. Medford 55, Massachusetts. 

Lamke, Tom A., Ph.D. (Univ. of Wisconsin), Research Specialist, Bureau of Research, 
Iowa State Teachers College, Cedar Falls, Iowa. 

Lehrer, Thomas A., M.A. (Harvard), Graduate student, Department of Mathematics, 
Harvard University, 6 Kirkland Road, Cambridge 38, Massachusetts. 

Lindquist, E. F., Ph.D. (State Univ. of Iowa), Professor of Education, College of Educa- 
tion, State University of lowa, Iowa City, Iowa. 

Miller, Raphael, M.A. (Yale), Graduate student, Department of Mathematics, Yale Uni- 
versity, 104 York Square, New Haven, Connecticut. 

Miyashita, Totaro, M.A. (Univ. of Tokyo), Graduate student, Department of Economics, 
University of Chicago, Room 472, International House, 1414 East 59th Street, Chicago 
37, Illinois. 

Moore, Roger H., B.S. (Univ. of Oregon), Research Assistant and Graduate student, De- 
partment of Mathematics, University of Oregon, 1319 East 15th Street, Eugene, Oregon. 

Newell, Charles R., B.Sc. (Univ. of Toronto), Statistician, Norton Company, Chippawa, 
Ontario, 1761 Peer Street, Niagara Falls, Ontario, Canada. 

Nicholson, Wesley L., A.B. (Univ. of Oregon), Research Assistant and Graduate student, 
Department of Mathematics, University of Oregon, 446 E. 13th Street, Eugene, Oregon. 

North, John D., Chairman and Managing Director, Boulton Paul Aircraft, Ltd., Wolver- 
hampton, England. 

Palmer, Boyd Z., B.A. (Earlham College, Richmond, Indiana), Graduate student, Depart- 
ment of Mathematical Statistics, University of North Carolina, University Trailer 
Court, #95, Chapel Hill, North Carolina. 

Romero, Mario G., Student, George Washington University, Washington, D. C. and em- 
ployee of Section on Mathematical Statistics, Bureau of Statistics, Direecion General 
de Estadistica, San Jose, Costa Rica, Central America. 

Rosenblatt, Harry M., B.S. (George Washington Univ.), Graduate student, Department 
of Mathematical Statistics, George Washington University, and Mathematical Statis- 
tician, Navy Department, Bureau of Ordnance, 506 Oglethorpe St., N. E., Washington 
D.¢. 

Schmid, John, Jr., Ph.D. (Univ. of Wisconsin), Assistant Professor and Examiner, Board 
of Examiners, Room 5, Berkey Hall, Michigan State College, East Lansing, Michigan. 

Tsao, Chia K., M.A. (Univ. of Oregon), Graduate Assistant, Statistical Laboratory, and 
graduate student, Department of Mathematics, University of Oregon, Eugene, 
Oregon. 

Tucker, Howard G., M.A. (Univ. of Calif.), Graduate student, Department of Mathematical 
Statistics, University of California, 1074 Spruce Street, Berkeley 7, California. 

Uemura, Kazuo, Kogakushi (Univ. of Tokyo), Graduate student, Department of Mathe- 
matical Statistics, University of North Carolina, 226 B. Dorm, Chapel Hill, North 
Carolina. 

Walker, Andrew M., M.A. (Univ. of Cambridge), Research Assistant in Mathematical 
Statistics, Mathematics Department, The University, Manchester 13, England. 

Winston, Gerald, M.A. (Columbia Univ.), Chief Statistician, Research and Development 
Branch, Philadelphia Quartermaster Depot, 2800 S. 20th St., Philadelphia 45, Penn- 
sylvania. 

Zahl, Samuel, B.S. (Univ. of Chieago), Graduate student, Department of Mathematics, 
University of Chicago, 4340 S. Drexel Boulevard, Chicago, Illinois. 





BLACKSBURG MEETING 311 


REPORT OF THE BLACKSBURG MEETING OF THE INSTITUTE 


The fifty-first meeting of the Institute of Mathematical Statistics was held at 
the Virginia Polytechnic Institute, Blacksburg, Virginia on March 19-21, 1952, 
with the Biometric Society (Eastern North American Region). Ninety persons 
attended the meeting, includthg the following thirty-seven members of the 
Institute: 


R. L. Anderson, R. E. Bechhofer, Z. W. Birnbaum, R. C. Bose, R. A. Bradley, A. E. 
Brandt, G. L. Burrows, E. L. Cox, Gertrude Cox, B. de Loor, D. B. Duncan, Churchill 
Eisenhart, R. E. Greenwood, R. J. Hader, Boyd Harshbarger, Harold Hotelling, Paul 
Irick, A. T. James, A. W. Kimball, C. F. Kossack, T. E. Kurtz, R. F. Link, Paul Meier, 
Sigeiti Moriguti, M. H. Quenouille, 8. N. Roy, Henry Scheffé, S. A. Schmitt, H. Fairfield 
Smith, Harry Smith, Jr., Henry Teicher, M. E. Terry, J. W. Tukey, G. W. Tyler, Kazuo 
Uemura, D. L. Wallace, Lionel Weiss. 


At the opening session, Wednesday morning, March 19, Professor M. H. 
Quenouille, Yale University, gave an address entitled The Consequences of Test- 
ing Significance. Professor J. M. Grayson, Virginia Polytechnic Institute, was the 
ghairman and H. Fairfield Smith, North Carolina State College, was a dis- 
cussant. 

The second session, held Wednesday afternoon, at which Professor C. F. 
Kossack, Purdue University, was chairman, was devoted to Multiple Compari- 
sons. At this session the following papers were given: 


1. Ona Multiple Decision Procedure Associated with Certain Ranking Problems. Robert E. 
Bechhofer, Columbia University. 

2. The Multiple Comparisons Test for Separating Ranked Treatments in an Analysis of 
Variance. D. B. Duncan, Virginia Polytechnic Institute. 

3. Allowances for Various Types of Error Rates. John W. Tukey, Princeton University. 

4. Short Cuts to Allowances. R. F. Link and D. L. Wallace, Princeton University. 


The Thursday morning session was devoted to contributed papers of the Bio- 
metric Society. 

At 2:00 P.M., Thursday, Professor Robert J. Hader, North Carolina State 
College, gave an address on Double Sampling Acceptance Inspection on Meas- 
urable Quality Characteristics. Professor H. L. Manning, Virginia Polytechnic 
Institute, was chairman and Professor R. C. Bose, University of North Carolina, 
was a discussant. 


At 3:30 P.M., Thursday, a joint session for contributed papers of the Institute 
and the Biometric Society was held with Professor P. 8. Dear, Virginia Poly- 
technic Institute, as chairman. The following papers were presented: 


1. On the Approximation of Sampling Distributions by Punch Card Methods. Carl F 
Kossack and Lester L. Helms, Purdue University. 

2. Resolvable Incomplete Block Designs with Two Replications. R. C. Bose and K. R. 
Nair, University of North Carolina. 

3. Rank Analysis of Incomplete Block Designs. 1. The Method of Paired Comparisons. 
R. A. Bradley and M. E. Terry, Virginia Polytechnic Institute. 





312 BLACKSBURG MEETING 


4. Multiple Regression with a Quantal Response. D. B. Duncan and R. C. Rhodes, Vir- 
ginia Polytechnic Institute. 


5. Rank Analysis of Incomplete Block Designs. II. The Method for Blocks of Three. Prelim- 


inary Report. (By title.) R. A. Bradley and M. E. Terry, Virginia Polytechnic Insti- 
tute. 


At 7:00 P.M., Thursday, the banquet was hetd, with Professor Boyd Harsh- 
barger, Virginia Polytechnic Institute, acting as chairman. Dr. Louis A. Pardue, 
Vice President of Virginia Polytechnic Institute, gave a welcome. Dr. H. N. 
Young, Director, Virginia Agricultural Experiment Station, gave an address 
entitled Administration of Research. 

The Friday morning session was devoted to contributed papers of the Insti- 
tute. Professor R. A. Bradley, Virginia Polytechnic Institute, was chairman. 
The following papers were presented: 


1. Limit Theorems Associated with Variants of the von Mises Statistic. M. Rosenblatt, 
University of Chicago. 

. A Modification of Schwarz’s Inequality with Applications to Distributions. Sigeiti 
Moriguti, University of North Carolina and University of Tokyo. 

. Confidence Intervals of Fixed Geometric Size for Scale Parameters. Preliminary Report. 
(By title.) Lionel Weiss, University of Virginia. 

. On Lower Bounds of Powers of Certain Multivariate Tests. 8S. N. Roy, University of 
North Carolina. 

. Normal Multivariate Analysis and the Orthogonal Group. A. T. James, Princeton 
University. 

. Exact Formulae in Sequential Analysis for Exponential Distributions. Johan H. B. 
Kemperman, Purdue University. 
A Note ona Generalized Behrens-Fisher Problem. (By title.) Henry Scheffé, Columbia 
University. 

. Large-Sample Confidence Intervals for Density Function Values at Percentage Points. 
(By title.) John E. Walsh, China Lake, California. 

. Sequential Sufficient Statistics. (By title.) R. R. Bahadur, Delhi, India. 

. Some Powerful Rank Order Tests. (By title.) Wassily Hoeffding, University of North 
Carolina. 

11. Confidence Bounds for a Set of Means. (By title.) D. A. S. Fraser, University 

of Toronto. 


At 2:00 P.M., Friday, Dr. Jonn H. Curtiss, National Bureau of Standards, 
gave an address entitled Some Chain Functions Useful in the Monte Carlo 
Method. Dr. L. A. Pardue, Virginia Polytechnic Institute, was chairman. 

At 3:15 P.M., Friday, a session was held on Experimental Design, at which 
Mr. Glenn Burrows, Bureau of Agricultural Economics, acted as chairman and 
Professor R. C. Bose, University of North Carolina, was a discussant. The follow- 
ing papers were presented: 


1. Recent Developments in Incomplete Block Designs. K. R. Nair, Forestry Research In- 
stitute, Dehra Dun, India. 
2. Latinized Rectangular Lattices. Boyd Harshbarger, Virginia Polytechnic Institute. 


Boyp HAaRSHBARGER 
Assistant Secretary 





PUBLICATIONS RECEIVED 


PUBLICATIONS RECEIVED 


FrANcisco Quiroz Cuaron, Tablas de Precios de Compra de Valores, Tomo I, Banco de 
Mexico, 8.A., Mexico City, 1952, 507 pp. 

Anpré G. Laurent, La Méthode Statistique dans l’ Industrie, Presses Universitaires de 
France, Paris, 1950, 134 pp. 








ESTADISTICA 


Official Journal of the Inter American Statistical Institute 


Vol. IX, No. 34 March 1952 
Contents 


Utilizaci6én de los datos Censales en el An4lisis de Mercados (Traduccién) 
Puitie M. Hauser 
Una Tasa Estandarizada para la Mortalidad, Definida en Unidades de Afios de Vida 
(Traduccién) Witiram HAENSZEL 
Exposicién sobre Investigaciones Enumerativas Generales—II (Traduccién) 
CATHERINE SENF 
Is There Overlap in Requests from International Organizations for National Statistical 
Information? 
Caracteristicas Demogrdficas: Principales Problemas Técnicos Relacionados con la 
Aplicaci6n de Algunos Estandares Recomendados para el Censo de las Américas 
de 1950 
Terminology of Concepts Relating to the Economically Active Population 
Problemas Referentes a la Tabulacién del Censo de Poblacién de 1950; Parte I, Car- 
acteristicas Demogrdficas Seleccionadas 
A Program for Developing Industrial Statistics 
Tabulaciones Adicionales sobre Familias 


Editorial Notes. Institute Affairs. Statistical News. Publications 


Editor: Francisco de Abrisqueta 


Inter American Statistical Institute, % Pan American Union, Washington 6, D.C., U. S. A. 


JOURNAL OF THE June 1952 
AMERICAN STATISTICAL ASSOCIATION Vol. 47. No. 258 


1108 16th Street, N. W., Washington 6, D. C. 


An Analysis of Some Failure Data D. J. Davis 
Classification and Analysis of Partially Balanced Incomplete Block Designs with Two Asso- 
ciate Classes ‘ R. C. Bose anp T. SHimanoto 
Replacing Variables in Correlation Problems Vincent I. West 
Operating Characteristics for Tests of the Stability of a Normal Population. . Joun E, Wats 
Multiple Sampling of Attributes D. S. Rosson anp A. J. Kine 
Estimating the Product of Several Random Variables G. D. SHELLARD 
Some Principles of Processing Census and Survey Data 
RoBERT B. VorGut AND MARTIN KRIESBERG 
Short-cut Methods of Estimating County Population RoBert C. SCHMITT 
Sampling Surveys in Central Africa J. R. H. Saavr 
Factors in the Accumulation of Social Statistics SOLOMON FABRICANT 
Comparison of Selected Measures of Ability of Communities to Bear Tax Burdens 
Lorin A. THOMPSON 
A Critical Evaluation of Available Agricultural Statistics Ivan M. LEE 
Prepaid Medical Care as a Source of Morbidity Data Neva R. DEARDORFF 
Some Cases in Which Yates’ Correction Should not be Applied Epwin L. Crow 


The American Statistical Association invites as members all per- 
sons interested in: 

1. development of new theory and method 

2. improvement of basic statistical data 

3. application of statistical methods to practical problems. 


1 
| 


fe acer aco, 


PN es CUS ot ce nen eta 





BIOMETRIKA 
A Journal for the Statistical Study of Biological Problems 


Volume 39 Contents Parts 1 and 2, June 1952 


1. On the concurrence of a set of regression lines. By K. D. TOCHER. 2. The covering circle of a sample 
from a circular normal distribution. By H. E. DANIELS. 3. The construction of balanced designs for 
experiments involving sequences of treatments. By H. D. PATTERSON. 4. The interpretation of inter- 
actions in factorial experiments. By E. J. WILLIAMS. 5. Some exact tests in multivariate analysis. By 
Ek. J. WILLIAMS. 6. Experimental designs for serially correlated observations. By R. M. WILLIAMS. 
7. Moment coefficients of the k-statistics in samples from a finite population. By JOHN WISHART. 8. 
Moment-statistics in samples from a finite population. By M.G. KENDALL. 9. The estimation of death- 
rates from capture-mark-recapture sampling. By P. A. P. MORAN. 10. Multifactorial designs of first 
order. By G. E. P. BOX. 11. On sampling from a population of rankers. By A. S.C. EHRENBERG. 
12. Tests of significance in canonical analysis. By F. H.C. MARRIOTT. 13. A sampling test of the x? 
theory for probability chains. By M.S. BARTLETT. 14. On mathematical analysis of style. By WIL- 
HELM FUCHS. 15. Regression, Structure and functional relationship. Part II. By M. G. KENDALL. 
16. Least-squares estimation of location and seale parameters using order statistics. By E. N. LLOYD. 
17. The time intervals between industrial accidents. By B. A. MAGUIRE, E. 8. PEARSON and A. H. A. 
WYNN. 18. Comparison of two approximations to the distribution of the range in small samples from 
normal populations. By E.8. PEARSON. 19. The frequency justification of certain sequential tests. By 
G. A. BARNARD. 20. MISCELLANEA. 21. REVIEWS 


The subscription price, payable in advance, is 458. inland, 548. export (per volume including postage). Cheques 
should be drawn to Biometrika and sent to “The Secretary, Biometrika Office, Department of Statistics, 
University College, London, W.C. 1.” All foreign cheques must be in sterling and drawn on a bank 
having a London agency. 


ECONOMETRICA 


Journal of the Econometric Society 
Contents of Vol. 20, April, 1952, include: 


A. CHarNnEs, W. W. Cooper, AND B. MELLON .Blending Aviation Gasolines—A 
Study in Programming Interdependent Activities in an Integrated Oil Company 
A. CHARNES Optimality and Degeneracy in Linear Programming 
M. J. FARRELL Irreversible Demand Functions 
A. Dvorerzky, J. Krerer, anp J. WoLrow1Tz The Inventory Problem: I. Case 
of Known Distributions of Demand 
Micu1o MorisHIMA Consumer’s Behavior and Liquidity Preference 
HERBERT A. SIMON On the Application of Servomechanism 
Theory in the Study of Production Control 
Frank E. Boruwe.u The Method of Equivalent Linearization 
MARTIN SHUBIK A Business Cycle Model with Organized Labor Considered 
GERARD DEBREt Definite and Semidefinite Quadratic Forms 
M. HaTaNAKA Note on Consolidation Within a Leontief System 
ABRAHAM WALD On a Relation Between Changes in Demand and Price Changes 
Report of the Louvain Meeting. Report of the New Delhi and Patna Meetings. Book 
Reviews. 


Published Quarterly Subscription rates available on request 


The Econometric Society is an international society for the advancement of economic theory in its 
relation to statistics and mathematics 


Subscriptions to Econometrica and inquiries about the work of the Society and the procedure in applying 
for membership should be addressed to William B. Simpson, Secretary, The Econometric Society, The 
University of Chicago, Chicago 37, Illinois, U. S. A. 





MATHEMATICAL REVIEWS 


A journal containing reviews of the mathematical liter- 
alure of the world, with full subject and author indices 


Publication of this journal is sponsored by the American Mathe- 
matical Society, Mathematical Association of America, Institute of 
Mathematical Statistics, London Mathematical Society, Edinburgh 
Mathematical Society, Union Matematica Argentina, and others. 


Subscriptions accepted to cover the calendar year only. 
Issues appear monthly except July. $20.00 per year. 
Send subscription order or request for sample copy to 


AMERICAN MATHEMATICAL SOCIETY 
80 Waterman Street, Providence 6, Rhode Island 


JOURNAL OF THE 
ROYAL STATISTICAL SOCIETY 


Series B ( Methodological) 


Contents of Volume 13, No. 2, 1951 


Davip G. KENDALL Some Problems in the Theory of Queues 
. E. DANIELS The Theory of Position Finding. (With Discussion) 
). R. BUCKLAND A Review of the Literature of Systematic Sampling 
. P. GODAMBE On Two-stage Sampling 
. V. SuxHaAtTME...On Certain Probability Distributions Arising from Points on a Line 
. G. Guest..The Estimation of Standard Error from Successive Finite Differences 
©. H. Simpson The Interpretation of Interaction in Contingency Tables 
. O. PATTERSON Complex Contingency Tables Treated by the Partition of x? 
5. ROSENBAUM The Variance of Least-Square Estimates under Linear Restraint 
. D. PATTERSON Change-over Trials 
. M. Grunpy 
A General Technique for the Analysis of Experiments with Incorrectly Treated Plots 
J. Fosney. Subjective Judgment in Statistical Analysis: An Experimental Study 

x. H. Jowett 
The Expression of the Complementary Outputs of Two 
Products in Terms of a Common Unit of Production Effort 
J. Dursrn AND A. STUART... Inversions and Rank Correlation Coefficients 
H. E. DANIELS... Note on Durbin and Stuart’s Formula for E(r,) 


The Royal Statistical Society, 4, Portugal Street, London, W.C.2. 





SKANDINAVISK 
AKTUARIETIDSKRIFT 


1951 - Parts 3 - 4 


Contents 


G. ARFWEDSON 


A Probability Distribution Connected 

with Stirling’s Second Class Numbers 

T. DaLentus anp M. GuRNEY The Problem of Optimum Stratification. II. 

HerRMAN WoLpD Demand Functions and the Integrability Condition 

G. Harpirz Post-war Mortality among Industrial Insured Lives in Norway 

S. Vaspa Analytical Studies in Stop-Loss Insurance 

Knut MEDIN 

The Sickness Experience of the Valkyrian Insurance Company, 1929-1948 
Per OrresTaD 

On the Test of the Hypothesis that the Probability 

of an Event Is Contained within Given Limits 


Annual subscription: 10 Swedish Crowns (Approx. $2.00). 
Inquiries and orders may be addressed to the Editor, 
SKARVIKSVAGEN 7 » DJURSHOLM (SWEDEN) 


SANKHYA 


The Indian Journal of Statistics 
Edited by P. C. Mahalanobis 


Vol. 11, Part 2, 1951 


The Estimation of Parameters in Certain Stochastic Processes Henry B. Mann 
Statistical Inference Applied to Classificatory Problems C. RADHAKRISHNA Rao 
Multivariate Binomial and Poisson Distributions A. 8S. KrisHNAMOORTHY 
On Errors of Estimates in Various Types of Double Sampling Procedure. .K. C. Sean 
Estimation of Parameters from Incomplete Data with Application to Design of 
Sample Surveys ABRAHAM MATTHAI 
Confluent Hypergeometric Function PRAN NATH 
A Study of Recent Trend in Infantile Mortality Rates in Calcutta by Longitudinal 
Survey K. N. Mirra, B. Buatracnarya, K. Dey, C. 8S. Dawn, 
M. Opaptan, AND A. K. GaYEN 

On the Non-Existence of Certain Difference Sets for Incomplete Group Designs 
S. S. SHRIKHANDE 

On the Non-Existence of Affine Resolvable Balanced Incomplete Block Designs 
S. S. SHRIKHANDE 

A Note on the Power of the Best Critical Region for Increasing Sample Size 

D. Basu 
Some Further Results on Errors in Double Sampling Technique CuHAMEL! BosE 
A Note on Price-Wage Variations in Cottage and Factory Economy. .G. C. MANDAL 


_ Annual! subscription: 30 rupees 
Inquiries and orders may be addressed to the 
Editor, Sankhy&, Presidency College, Calcutta, India. 








